ED 218 075 • 

AUTHOR 
TITLE ., 

INSTITUTION 
SPONS AGENCY 
PUB DAf E 
(>RANT 
,NOTE 

EDRS PRICE 
DESCRIPTORS 



' ' DOCUMENT RESUME 



^SE 0'37 440 



IDENTIFIERS 



Cole, Henry P.; Moss, Judson . > .'■ 

Evaluation oi Learning for Continuing Engineering 
Education. Draft Report, Revised. - '. 

Kentucky Univ.. ^ Lexington. Coll. of Engineering. 

National Science Foundaticta, Washington, 'D.C. 

Jan 80 » ' • / - 

'NSF/SED-80008; SED-78-22060 ■ ' * ^ 

394p. . . / ' . . 

MFOl/PQW Plus Postage; \ ' » 

♦Academic Achievement; *Cou£T5e Evaluation; Qurricyilum 
Development; Giirriculum. Evaluation; Engineering; 
^Engineering Education; Evaluation Methods; Formative 
Evaluation; Higher Education ;* Measures (Individuals); 
Pretests Postte^ts; *Prof essional Continuing' 
Education; Science Education; Student ' ' 
Characteristics; Summative Evaluation; *Testing 
National Science Foundation 



ABSTRACT , . • . ' 

f This"-*eport documents the measurement of learning 
outfcomes and the ormative/summative evaluation 6f courses as- derived- 

{from the activities and experiences of educators in ^many fields of 
endeavor to determine^ how best to design courses of instruction a'nd 

• how to measure the learning Outcomes resulting f rom ^ coursers. The 
report is divided into four inajor parts. The first part describes a ^ 
general typology of continuing education courses, the characteristics 
of persons 6^nrolled* in* such ^courses, and the use of f^ormative and 
summative evaluation 'in ^CQursjg development. The second part Js a 
detailed explanation coTifi^rning the use of various i:ypes of testing * 
procedures in 6neasur ing learning outcomes. The third part; describes 

^alternative inethoas for developing valid and reliable test^ to 
measure individua^l learning achievement and overall, course 
effectiveness. 'Methods for reporting. the results of 'learning 
assessments * to individuals and. groups ar% presentefl in the last 
section. The^|eport is written such ,that^ parts of it may be useful to 
individuals with speCrfic needs yithout havi*ng to read the whole itext 
and is 'facilitated by a detailed^'table of contents and a detailed 

-subject index. A precis of the 14 chapters .is included in the ' 
pretace. ( Author /JN) 



4 - 



Reproductions s^upplied by EDRS are .the 'best that can be made * 



. * . from the original document. ^ ' * 

♦-^^^^^^AA^^W^^^^Atfc'tMr* ************** 



I 



( NSF/SED-80008 



r 



U^S. OEPiJRTMENT OF^OUCATIOr* 
NATIONAL INSTITUTE OF EcfuCATlON 

tOUCATiONAL RESOURCES INFORMATION 

CENTER lERrC' 
^/i'Mii dcHucit'fit nab beer rt[5ruJu<,t*0 dS 
rei.ei'VH.* ^rti-^i tl;it^ pefbon or om^itJiJJtion 

• ur.^iindtKK) a ^ 
Mirn.f ' h}Mij>>s h(iv tieen T^iijt' *q .j^pnjve 
'►'t)fudu(. "lor fjij<ilitv 

• Points, i)t V *'W T opiniv)ni btdtefl m this fiocu 
n^»>Fit tig ru)t f^s^if W feprt'btnt oftk lalJMIE 



DRAFT REPORT 

" 'EVALUATION OF LEARNING FOR 
CONTINUING ENGINEERING EDUCATION 

Heriry P. Cole & Judson Moss 

University of Kentucky ^ 
Lexington, Kentucky - 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCHS 
INFORMATlOhf CENTER (ERIC)." 



> - 



. Evaluation of Learning " 
' • for / 
Continuing Engineering Education 



HeacV Cole . . Eel .Ji . 1*^ ; 
\ . /assi,ste<y by r 



. . 'JudSor^ Moss, Ea.D.^ , 

^-f J ' S^isering. 'Co^ittee,* MeasuTemfehl: -for Learning oW:Comes 

' in -Continuing 'E^lucat ion for Scientists and l^^gine^ers 

• • • . • ' PscQeot Members': • .Bill^ J. Bar-field, -Ph.D.', 

Professor David K." Ely the , ^Frank Gohs, M#S.*, 

• . - - Edward icj.fer , Pli-D. , Warren 'Lacefiield, M. >S.» 

Donna Mertens, Ph.D*. , and' Joanne Wilson, M. S. - 
....•Onivjsrsity of Kentucky, Lexington, KY) 



• r 



4 ^ 



'^Professor and Chairman, Department of Educational Psychology 
' " College of Education', -University of Kentucky, Lexington, KY* 

'Pro j feet Manager, Learning Outcomes Measurement Project^ Office 
• of Continuing Eduqation, College of Engineering, University of 
Kentucky ,. Lexingt<^, KY, ^ 



Evaluation of Learning 
f or • 

- Continuing Engineering Education 
• > • 



ERIC 



In 



Measurement* for Learnl-ng Outcoifies } < 
Cdntinulng. Education"' for 'Scientist.? afid- Engir^erii 

■ «6ffiee 6f Continuing Education, • ' 
College of ' Englneef ing,, Univer.aity' of Kentucky > 

•Lexington, . • ' 

October 1979 

Revised January '1980 . / 



These Guidelines were prepared with the sjigport of the . 
National Science Foundation, Grant No: SED7t^=^060, Any' 
•opinions/ findings, conclusions/ or reqonmvenda'tions expressed 
herein ^ire thosQ of the authors and do not •necessarily 
reflect the views of the National. Sqience Foundation. 



- PREFACE 

This boo^c is writteix, particularly ,for the director of ' 
^continuing^ education in engineering and- f elat^ technical 
fields. The book ha3 developed out of- the^ activities pf a 
group of persons with diverse talents and backgrounds, all 
. of v)hom were involv^ in a project concerned with the 
measurement of learning^ outcomes ^or cohtinuing education 
courses -^in engineering. Members of this group consisted of 
a highly experiencSd'ddrector of continuing education in 
engineering; professors df coritiTiuing education 'courses in 
'engineering; experienced specialists in adult and higher 
education with •much experience in developing and teaphing 
continuing education courses^ in a variety of ttpchtiical fields 
including the health professions; and educational 
psychologists expeft and highly experienced in the areas of 
measurement of hlJtman abilities artd skills, the design of 
te^ts, and educat^ional pr^ogram and course<, evaluation. 

For a period of two years* this grpup has met on a regula 
basis in an ongoing seminar about the measurement of learning 
. outcomes for a variety of* courses typical of those offered 
in continuing education programs for ^engineers at many 
colleges anS universities and in other settings as well* 
In a'cJaition, memberar of. the, group have worked together as 
.small teams in the actual development and use of methods ^and 
procedures for the measuriament of %he learning outcomes 

^ • ' iii . ' ' .. '• 



resulting from a number of continuing education courses in 
engineering* taught under the direction of the University of , 
Keptucicy, College of Engineering, Office of * Cdiitinuing 
Education and- ETjc tens ion. ' ^ 

The book, is a set of * detailed guidelines which bring 
together whatthe project team* has learned about how'to go 
about ;the measurement of learning outcomes, in this field. " 
Much, of whajt is j^resented in the book concernin'g the 
measifrement of learning ^outcomes and the formative and 
suimnatri-ve evaluation of courses has been, derived from ^he 
activities and experiences of many other gducators in many 
f ield9 \in efforts to determine ^how to best design courses of 
instruction and how to measure the learning 'Outcomes 
resTdlting from courses. What is new is the, bringing ot all 
of this information together in the context of adult education 
Jsid specif ica.lly in the area of engineering* courses .designed 
for continuing education purposes. Conseqqently , , the .way in 
which specific procedures bear on the measurement of learning 
outcomes in these odursest^is wall illustrated in nlany examples 
^ There 'are four main sections in this booky The ffrst 
part deBciribesna genetal typology of continuing education 
courses, the characteristics ofi .persons enrolled in such 
courses, and the use of .^formative Sand summative evaluation in 
course development.' The second part of the book is *a 
detailed explanatioh concerning the use of various types of 
testing procedures in measuring learning outcomes^ A- third . 



^3ecti<bn describes- -ar'tefnative jnethods for developing valid 

**- . .• , 

vand reliable tests to measyire individual learning achievement 
^ ^ , ' ^- ' » , » . ■ " ■ 

and overall course.- effectiveness . 'A fourth .section presents, 

'rtethods-for reporting the results of learning assessments to 
individuals and groups. The reader is advised to scan the ' 
table bf contents, read 'Chapters- 1 and 14, and then' fco select 
those seqtiona ofvthe text of most interest. 

. The book is written such that parts of it may^ be useful 
to individuals with specific needs without their having to' * 

- read Xt&e' whole text.' This objective is further 'facilitated , 

\ . ' . « 

by t^ie\ very detailed table of contents and a detailed subject 
index. \ There i~s "information abdxit a wide range^'of topics 
which should be 9t kalue to directdts of continuing education 
in enginlaerihg as'-well as to the professionals who develop ' 
and teach such courses. . 

Soine\dhapters are more geared to the specific procedures 
concerning*! how good courses may be developed^ and how their 
learning otktcomea may be maa.aured in. efficient and yet 



effective 



\ 



rays.' The information presented, in tha«e( chapters 
•has relevaric'e to the design and evaluation of any course, 
although th* examplep presented are specif ic^illy in 
engineering \ajid in the continuing education ' context . Chapters 
4, 5, 7, and\lO can be used as the basis for study by 
instructors who have aA interest in inproving the 
effectiveness ^f their course In reaching intended learning 
outcomes . 



4 



Otl^r chapters address matters of great interest to ^the 
administ|rators and policy- makers who operate a^id* oversee , 
continuSng education programs; Chapter 2 provides a 
classif|lc^^tion of four basic' -typologies^of continuing • 
education' courses arid some insight inta how the typology of ' 



the coujrse effects both its delivery and ^its evaluation/ 

'I 

Chapterf[3 reminds the reader of the -salient differences 

betweenf continuing education courses and the students 

, I • ' ' 

typicaljly enrolled in them and the more traditional under- - 

graduate? and graduate formal ' courses vhich are part of * college 

and' university degree programs. Implications for the staffing 

of such- courses,, their, scheduling, and their evaluation are 

.a •■ *» 
noted, v; ■ . ' • ^ 

• C^;iapters .4 through 9 describe in great detail the 

var'ious types of tests and testing procedures which are 

available and .useful- to the businie'ss of designing courses 

and measuring* their effectiveness in terms of the achievement 

of learD^rs^on a variety of performance measures, 'All the 

proc«dure[8 axe based upon statia^ in op^rntional teriia -the 

intended |Learning outcomes related to the performance of 

p6r»ons ih the work setting. 

'I ( 

Chapter 10 outlines a very detailed' but general and 
useful set| of procedures for insuring that well organized ^ 
courses an^ valid. and reliable measurement procedures are 
developed, I. Chapter- 11 describes empirical ways of 



vi 



detennini*ig the degree pf validity ajid reliability of tests^ 
test items, 'and other learning assessmeiit^procedures which 
tiave been developed for purposes of making inferences about ' - 
the degree of group ^and individual learning resulting from 
a course. . » 

Chapter 12 p/esents important limitations of tests as. 
assessment devices." Thig chapter i^ important because persons 
should not misuse test data in the construction of inferences 
about the degree of success of individual stddertts ' and course 
effectiveness . 

Chapter 13 presents detailed procedures and information 
about how to report the data gatheead from course 'evaluations 
and individual student achievement. How this information 
should be used, with whom it should, be shared, in what j 
.manner, *and for what purposes are<,all discussed. c ' ^ 

- Finally, Chapter 14 is a summary of the entire book in' 
that a i^et of recommendations are made for the development 
of an "evaluated cTeu." ^All t|ie strengths and limitation* of 
the procedures available for the measurement of individual * 
learning by students and for jtjdging the effectiveness of 

courses are recalled.. This information is use^ to conclude 

\ 

with a |:acommendatioh that courses and program^ i^e evaluated 
^nd certified rather than individual^. \ 

All of these chapters may be of- interest and value to 
the continuing educator charged wit?fi being accountafcfle to 



vii 



3 



A 



pro ^e^ion'al agencies, individuals , 'and administrative . 
superiors for the quality and., effectiveness of. the ^.program ' 
and couraes operated under his -or her jurisdiction. * ' 

- The book has proven to*he of interest , to direptots and 
faculty^ In pontinu^g education in other^ technickl fields, 
auch as Wur sing -and .the -allied 'health professions. Although 
all of th§ ejfamples provided are in continuing* engineering " 
'education^ whtt is presentjed is expected to be'«^enerally ' 
useful to continuing »ducation' activities in mapy areas. 



1* 

<* ... •> 



viii 



I -Althou(^h this book was dprafted by^pjrimarily one author/ 
it is truly a group effort.,- All of 'th6 persons listed" on the 
title page^jnad^e qpntinujlng and important contribuj^ions'. The 
members . of the project cbmnjittee met ^over a two year period 
for regular seminars during which much of thfe specific content 
of, the'' book and miny o£ the -topics ^were generated*, dispussed 
^ /at length, and often debated in a lively manner: In addition, 
individual membe^-s of the project group were, ass igf{ied to the - 
• developpient of eyaluation^ instruments add procedures a'nd to 
the us6 of tljieae procedures in the evaluation of the l6arnln,(^ - 
outcomes erf a number of continuing education -courses. ^ The 
. courses are part of a program ^offered* under the supervision 
of Associate Dean Davi^^^. Bly'the, Director of Continuing 
Education^ Collet* Of Engineering', University ot Kentucky. 

Dean Bl^ythe • s. exp^ience* and knowledge of continuing * 
education .and. ?i^ngineering were invalu4ble. He spent'.many 
hours educating the members of 'the group in key matters, 
directing the primary authpr tp many sources which are cited - 

the reference* in the book, and providing a tutorial in 
ri^any ajspects of the topic with which, the book deals. Iti 
addition, .Dean Blythe was responsible for brjLnging 
the group together and^^or obtaining the funds from the 
National Science PoundSition which made the project possible. 

Dr. Judson Moss, the Pri9ject Manager and an experienced 
specialist in adult and continuing education, also provided 



ix 



many of these same types of service's • along with Dean 

■Rlythe prepared the proposals which ftinder^ ttie pro-ject 
activities. He also contributed mucl^ specialized knpwledcje 
and expertise! in the area of adult educatddn and' wrote the 
portions of Chapter 3 which deal with adult l^rner 
characteristics. In addition he cQordlna^ted a^nd organized 
all the multiply activities of the protect members, compiled' 
.all o't the data from the studied" completed /""and collected) 
edited, Wrote/ and rewjpote all of the project feports with 
the -assistance of othet project members / 'In addition. 
Or, ?toss continually directed tUe primary author of the book' 
to important works, and studies in-'oontinaing . education in 
engineering, ahd in otlleE;Jrelat:ed f leads. 'Purthermone/ * ' 
Dr\ Mossr edited and proof read air 6f the ehiapters in the ' 
feookv^s^t^ey -^ere'drafted^by t^he primarv^ aut^r .'^His- cheerfuf 
and ver? capable assistance in t1i.is' task'" and in the much." ^ 

. . • • ^ .'X- . ■ • • ••■ ■ ^ 

more complex Jbask of managing tha- project is gi'eatly < • 

apprediated' and. was es&eptial to. -tbe completion of *the 

project activitifes and th^" completion' of * the' book . . 
•• . • ' * - L * 

Mr. Warren Lacefield, Mr.- PranK-6ohV, and^Dr, Edward , 
Kifer along with the primary autltor d ^NTe l^^ed » mb f the 
measurement' scales, questionnaires, arid ot'her ^ipstpume'nts 

r • » ' 1 ' * . 

• « \ 1 * 

Used 'in the actual evaluation t>f the cloiirses studied in the 

^ - ( ; ' '^ 

project. '^In addition , -these persons.' did the ^ulTc of all of 
the data processing and analysis which allpwdd. tVie project 
staff and the course instructors to determine how' ef^f eptive 



. the courses were arid how valid and reliable the test and 
measurement procedures.,, Many of the examples presented in 
the* boci^ include thfe instruments and thk data d'eveloped W 
these persons. Mr,^'Gohs and Mr. Lacefield also wrote 
Appe-ndix A. In addition, Mr. Gohs wrote some of the material, 
included in Chapter 13 .particularly about the Urban $torm 
"Water Quality Modeling course lised as an example of how to 
measure and report le.arning outcomes. 

Dr. Donna M. Mertens designed, conducted,, and analyzed 
the results of the evaluation of the Engineering -.Economics 
course which was included as on^ of the courses studied. In 
addition she* wrote the initial draft of the last section of 
Chapter 3 concerned* with the motivations of professionals 
for attending continuing education courses. 

Dr. Billy Barfield and his partner ^Dr. Tom Haan, authors 
of Hydrology , and Sed i mentology of Surface Min ed Lands (1978) 
provided a sterling example of ia well designed' and operated 
continuing education course. Many of the examples in the book 
are ba^ted^^upoi^^ the teaching . 

of this well-organized and excellent course. Inadkition, 
Dr. Barfield participated in the ongoing seminars and 
discussions and provideKi- the invaluable insights ^nd • 
criticisms of an experienced 'engineer and continuing 
education pro^fejasor. ? ' ' . * 

' Ms^ Joanne -Wilson, Mr. Frank Gohs, and' Dr* Edward Kifer 
all provided the primary author a continuing and stimulating, 
dialogue about the topics of the. book in aYiother ongoing 

xi . 

- / - ■ ' 

- 

f '} 



doctoral seminar in.- educational program- and product eValuatior^. 
This seminar was taught in the Department of Educational 
Psychology and Counseling during J^he,' fall l'^9 semester. All- • 
' three individuals attended regularly., were part of the project 
team, and greatly inflifenced the content of the bookf as it 
•was drafted by the primary author. In addition , all .three 
persons, -as have other members of the project team, were 
critical readers of the early drafts of the work and 
made many suggestions leading to its improyement . Mr. Warren 
Lacefield did an exceptionaLly detailed critical reading of 
the second dtaft of the' manuscript and mkde many improvements 
in both the style and conterft of each chapter. 

Ms . Melanie Barber^ typo^ the -several drafts of each 
chapter, accurately, effipiently, "and %ith good humor/ She 
also provided assistjance in checking and locating references. 

^ Many professors of continuing education courses in 
engineering made the project possible by letting teams oC- 
project members come into their classes and^ participate in 
the instruotion, submitting to interviews and many questions, 
and developing tests for their courses, modifying the tests 
to improve ^hem after sub&equent analys'is, and seeing to it - 
that the tests were administered to* the persons enrolled 
in their courses. The author and the project staff tr* 
particularly indebted to Dr. Billy Barfield, Dr. Thoipas 
HAan, and Dr. Michael Meadows. 

xii 



,, It 'has been an honor to be part of this -group and 
have been extended thel prdy'ilege of being the primary 
spokesperson for the coli:ective wisdom" of the staff. 

^ .» * Henry P,^ Cole 



.636 Bellcastle Road 
* Lexington, Kentuoky 40505 

March 30, L980 



xiii 



• ' TABLE OF CONTENTS 
Chapter - . ^ Title *, . - Page 



1 injiboduction'and overview ' ' , 1 • 

■■■ ■ ^:-T: ■ - .. . • . • 

Part 1 - Bas/c Properties of Continuing Eiduc^tion Courses, Their 

Development, and Evaluation 

2 CHARACTERISTICS OF CONTINUING ENGINEERING 

- EDUCATION COURSES : ✓ . 12 

." )■ • . ■ 

Fourr Basic Types of Courses 12 

Remediatioa . * 13 

Extending Prior Knowledge and Skill 13 
Imparting Adva^nced Technical Concepts 

and Skills • \ 14 * 

Exposure to. Knowledge Outside of ' * , 

Engineering Science^ ^ . 15^ 

4 Different Instructional Purposes ana Methods 

^ Across Course Types * 16 

Assessing Learning , Across Cours^ Types 19 

Utility of Learning Outcome Assessments .... 21 , 
Limitations of Typical Learning Asseswnent 

Procedures • • r . • 23 

>^ Conclusion . . / i ,, 25 

3 CHARACTERISTICS OP COURSE PARTICIPANTS 27 

9 • 

Focus on Practical Needs • • • . 27 

* Professional Engineers as Students 31 

Adult Learners: Andragogy Versus Pedagogy • . 35 
Role? of Adu^t Learners in the Learning 

Process 36 

Roles of Adult Learners in Evaluation of 

Courses • m 38 

Motivations for Attending Contimiing 

- JBddoation Course* , \ 41 

Conclusion \ 4 6 

4 • FORMATIVE EVALUATION AND COURSE DEVELOPMENT 49 

f ^ • 

Stages of Course Development . • • • ^ .'X • . . 49 ' 

Selecting Course Content . 52 

Refining Coi&rse Objectives and Learning 

Assessment: Tasks ^ 55 

^ * , Measuring Only That Which ifr Appropriate . 60 ' 

Packaging and Delivery Considerations .... 65 

Selection of Instructors 69 

Specialized^Equipment 4nd Facilities: 
Implications for Course D(e^livery and 

Learning Assessment • ' , • 72 

Practitioners^ Tacit Evaluation o^Courses V : .74 • 

Conclusion • • \ 75 



ERLC 



Chapter , • Title 



* 5 SUMMATIVE EVALUATION OF LEARNING OUTCOMES 77 

^ Need for Sununative Evaluation* . . ' . 78 • 

* ' .Differences Between Summative and Formative ' 

. \ . Evaluation , ' 8 0 

Pour General- Learning Assessment Procedures . . 81 
. Developing Sound Learning Assessment 

Procedures ; ' 81 " 



1 

' Part 2 - Using Te3ts to^ Measure Learning Outcomes 
6 PRE-TESTS, THEIR PURPOSES AND USES 



Pre-testS£as Informat^:*^e and Screening 

Devices . , , 8^ 

< ' Pre-tests as Devices fo;: Adjusting Course 

Contepi;,jr!rds.^Oper>fe-^^ - . . '87 

Pre-te^s, as Ina^ators of Baseline 

^ Performance . . . , 88 

^ Learning' Resulting from Pre-test Experiences. . 93 

Two Approaches, to Construating Pre-tests . . . 96. 

7 ' EMBEDDED TESTS: THEIR PURPOSES AND^USES 101 

Traditional Embedded Tests lOl 

Abbreviated Forms of Embedded TestJ 106 

An' Example of a Course with Embeddbd .Tests , . 107 
Reasonable Expectations for Achiev%nent 

Within a Short Course 112 

Practical Problems in Using Complex .Emb44ded 

Test Tasks in Short Courses . . . \ * 113 * 

Abbreviated Emb44ded Test Tasks: An I / 

Illustration ..^^ 115 

Purposes *and Properties of Abbreviated 

. Embedded Test Tasks 117 

Advantages of Multiple Choice^ Items as^ 

Embedded Test-^Tasks . . . . . 1.. . . . . 120 

Precautions in Developing Multiple Choice and 

Other Objective Test Items . . \ \ 1 . . . . 121 
Importance of Item Independence . . . ^ . 124 

Generalization of Item Constriction Procedures 

to Other Test Formats 126 

/ Generalization bf I1:em Construction Procedures 

to Pre-, Post, and Delayed Post Teat 

Construction • ... p .......... . 1.28* 

Conclusion . . 130 



ERJC 



XVI 



Chapter 



Title 



Page 



12 



Calculatdon of item Difficulty and 

Discrimination Indices for Criterion' 

.Referenced Tests ^. 7 . . 213 
Item Analysis Procedures in Perspective^ . . . ' ,215 
Methods of Reusability Estimation: The NR 

, aad CR CJases ............ ^ » 216 

Alternate Forms Method ^ 217 

Test Re^Test Method . . . ; \ . 225' 

Sub-divided Te5t Method 228 

Internal Consistency Methods 288 

Conclusion . . ., , . . . 231 

LIMITATIONS OF TESTS 235 

<- 

Sources of invalidity and Unrjeliability ... ^ 23,5 

The Standard^ Error of Estimate . . . . ^ . . . 240 

Stability ofXGroup Mean Scores .\ . 243 

Advantages of\cr iter ion Referenced Tests ... 246 

Summary of Major Points Concerning Testing . 2 54 



Part 4 - Sharing Learning Assessments and CourBe Evaluationg 



13 



X4 



REBORTINQ THE ASSES SHBm«^ OP ^il»l«HJHS ©0TC0M13 259 

Participants.' Needs 2 59 

Instructors' Needs 260 

Program Administration Needs 262 

Client Agency Needs .. ^ ........*. . 263 

Professional: soolatias' Needs . ^ ...... . 263 

Meeting Diverse Information Needs 264 

Basic Information r Student Achievement, 

.^Course, and Instructor Characteristics ... . 266 
Gathering and Presenting' ^Basi^ Achievement 

Data: An Exeunple 269 

Reporting Learning Outcomes to I-ndividual 

Students 282 

Keeping"^ Learning Outcomes t6 Individuals 

Private » 294 

Precautions to Prevent the Abuse of Test 

Scores. 298 

Making Course Evaluations Public. 303. 

'Conclusiox^ . . . ; , 305 

CONCLUSION - WSCOMMBHDi^TlONB FOR KWJVLUATED CBUs 306 



Courses Where Tes^s Provide Accurate Estimates 

of Learning ; ^ . 306 

Courses Where Tes^-s a^-e Inadequate Estimates 

of Learning . * 308 

Limited' Tiwe for Testing 309 

Inadequacy of Testing in Sampling the 

performance Domain : . . "Bli 



ERLC 



xvm 



is 



Chapter Title ^ • Page 

Growth of Learning 'After Course Completion . . 312 
The Need for Multiple Indicators of Learning 

Outcomes ,313 

The Impossibility of Ma<king "Complete" 

Learning- Assessments of Individuals .... 315 
4 - Means for Making Compreheasive Assessment of 

Course Effectiveness 316 

Logical Requirements for 'Certification of* * * 

- • Courses 318 

Logical Rec(uir«nent8 for Certification of* * 

Persons . : 318 

The Importance of Options fdr Participants *. ! 320 
Involving All Participants in Learning 

Assessment Activities 321 

Conclusion ... ^ 1 ..... ! 324 

REFERENCES ' . . ' ' , .325 

APPENDIX A §31 

APPENDIX B • 344 

AUTHOR INDEX 368 

SUBJECT INDEX (incomplete) j ^ . ' 



ERIC 



19 



LIST OF TABLES 



Table. 



( ^ * 1 Engineers' Interest £n Non-Forml 

\ \ * E^cation Programs Jl . . 32 



2 Rank Ordering pf Motivational Factoas for 

,^Participatinq in Continuing Education • . • 44 

3 Steps for Developing Pre-tests, Embedded Tests, 

Post Tests, and Delayed Post Te^'ts, by 

Which to'Esti^ate Learning Outcomes .... 161 

4 Performance Objectives for Open Channel 

"Hydraulic Structures Unit: in Illustration . 

' . of Test Construction Procedures & 

348^ 

. < 

5* Test "for "Open ChanneJ. Hydraulics" Unit - . i-jr^ 
' Illustrating the Mapping of Items to 

Performance Objectives i. . . . * 

- - . '350' 

' 6 Scores of Engineers on Alternate Forms of ' v 

Three Criterion Referenced Mastery Tests . . 220 

» Pre- and Post Tefct Items and Their Assignment 
to Teat Forms for. the Urban Storm 

Water Course . ?7-? 

.\ - . i 

8 ■ Pre^test Total Scores Across Test Forms - 

. Ur|5an Storm Water Quality Modeling Course . 274 
- . ' ' - ' ■ / 

9 Post Test Total Scores Across Test Forms - 

Urban Storm Water Quality Modeling Courae .- * 276 v 

A-1^ Demographic Information Questionnaire . . . 333 ° 

A-'2 Participant Reaction Questionnaire 336 

A-3 Satisfaction/Utilizaltlon Survey Form . . : . . , 340 

A-4 , Structured. Personnel Interview Protocol *. . . 342 



10 Limiting Velocities and Iije^tive Forces for 

Open Channels t t t 353 



11 Liet of CQnsiderations f or ^Preparing 

• Multiple Choice Iteitts % ^ . . . V. ^ t . . . ' 366 



xxi^ 



. LIST OF FIGURES ^ 

Figure - , ,' - . - ' • ' . ^ p^q© 

1 Pre- and llost Test Scores by, Persons with 

Test Means, Standard Deviations, afid Persons, 
by Score Regressioh Lines for a Short Course 80. 
• , . ' • .. ' 

2 Illustration- of a Graphic Mean's for Reporting 

Learning Outcome Measurements for a Short 
Course to Individual Pajrticipants and Groups '278 

» 

3 Learning Outcomes Resulting from a £hort Course 

for Engineers on Urban Storm Water Quali4y^ • 
. - Modeling i ...... ,^ 279 

4 Learning Outlines Resulting from a Short 

Course for Engineers on Urban Storm Water . - ^ 

» Quality Modeling . . . . 280 

. 5 Sampia 'Standard Answer Sheet for Manual or 

Machine Scoring . . ^ . - ^^6 

6 Manual Individual Ac'ljjievanent Reporting Form . . 289 

• 7 ■ Computerized Individual . Achievainent Reporting 

Form . =1 293 

8 Properties of Typical Channels ... T »/.... . 353 

n-VR for Various Rata^rdance Classes ; 3 54 

Xo Solution for Manning's Equation, Vegetated 

Waterways . . . • ' 35^5 

\\. Solution for Manning's Eqitat ion, 'Vegetated ' 

Waterways • ^ 35^ 



EMC - 



; ' xxiii 



1^ 




^NTRODUCTIO^^^/is^|*(^^.OVEjRVIEW 



Th'is book is intended J'^^ o-'f^ Kract^cal guideiines' 
for directors of continuing- ^ag-iA#a'ng education and others 
involved in the task of helpin^^^^o'^pdate ^the knowledge-, 
skillq, and practice of the;Na\|8^ 's .many'engii^eers. The 
-guidelines mayalso be of value^-^to' those persons involved in 
the continuing education of oth'fer ■ scientific and technical 
^ specialties. . , * 

The need to involve eng^ine^rs^ and other technical - 
•professions in .continuing education activities throughout ' 
their careers is grounded in a number ^of factors. Pirst, 
^tne present rate of knowledge expansion and technology 
insures that c6ntinui;ig edification, in a broad range- of / 
scientific and technical topics is required' for maintaining 
competence among • engineers^nd othfer te'chnical- personnel. 
Professional and scientific jour;ials .assist, toward this end 
as do formal courses of instruction and programs in the 
engineering sciences at colle'ges and universities. 

* , > ^* - 

In addition, many industi^l organizations engage in 
.research and development and provide" instruction fo^.' their 

technical staffs in the new knowledge developed' from their 

"/-••■ 

own and others' activities. -^Yet, .^here remains a strong 



-1- 



V 

need for the systematic orqanization and efficient 
presentation of basic and newer tecj:^ical knowledge to a 
wide audience Sf engineers through short courses, con- 
ferences, workshops, evening classes, and similar activities 
(Klus & Jones, 1975). Many of the Nation's engineers work 
.in small organizations rfot able to mount the ongoing 
technical trailing programs common "to some of the larger • 

t 

firms.^ In addition; even the largest engineering firm or 
company is not capable of offering the wide number of 
continuing educatibn courses and delivery modes needed even 
by their-own employees, much less meet the needs of technical 
personnel from other a,gencies and areas. Well organized and 
managed continuing education 'pyrograms fot the engineering 
sci€ftces need- to .be consistently available. ^ wide range of 
q,ourses needed by different persons is required,.* .,In 
addition, multiple modes of course delivery 'are needed. . . ' , 

For example,„^act\cing engineers today need courses in 
areas as-diverse as human* relations skills, engineering 
economics, recent technical develapments in m.icro-processing 
e'quipment, and effects of specific environmental toxins and 
their' proper management. ' Engineers increasingly have become - 
involved in long-term community and state planning. 
Engineers are freguently the coordinators of 'industrial and 
community de^relopment groups, the persons responsible for - 
knowing and^insuring. compliance with. environmental' prot;ection 



« » 



-2- 

\ 



laws, and m^jor consumers and 'users -of recent technical • 
developments. It is impossible to teach the breadth of know- 
Xedqe and skills required to satisy this range of assignments 
in one or two professional degree programs completed at a 
university or coll^. . it is also Cinrealistic to expect 
that this- wide range of skills ^nd knowledge will auto- 
matically develop^simply through "on the job training." Each 
of these areas contain large amounts of technical information 
and specific skills wiiich often require additional systematic 
instruction beyond that which can be received in preparatory 
professional programs at colleges and universities'. 
Acquisition of this additional technical information by the 
engineer, and his or her greater facility in specific ^^^(^ 

• * » ^^^^ 

technical skills, are expected learning outcomes from most' 

continuing education courses. It is the measurement of 

these and other related types of learning outcomes withs 

which this book is concerned. ^ • . ' 

The best w^iys to assist engineers in' the acquisition of 
» 

the^e types of knowledge ancJ skills depend on a number -of 
peripheral factors. These include: . a)^ the geographic 
location and distribution of the persons needing the' 
particular typejof instruction;, b) the content of the course, 
its complexity, and its optimal duration; -c) the .sped if iq 
learning outcomes sought as the result of the course: e.g., 
increased awareness of the law or available technology , 
specific* improved" performance in technical areas such as 



use of mictt)-processors in the operation and use of 
- industrial production processes;, and d) the characteristics 
of the engineering students who will be involved: e.g., 
prior relevant technical training, previous/ work experien(^e, . 
recency of formal courses and technical work in areas such ^ 
as mathematics and computer programming. 

•For all of these reasons, once a continuing education ' 
need in a specific technical area is identified, one cannot 
simply produce a standard course.^a,ught by traditional means. 
Decisions concerning whether to offer the course as a 6hort 
course at a^ three day conference, as a once-a-week evening 
class, as "a" correspondence cxjurse, or'as a mail-out TV 
cassette tape course (with accompanying readfincis, workbooks, 
and manuals) should be based on considerations of geographic 
distribution of participants, availability of qualified, and 
competent instructional personnel, the size of the 
prospective student group, and many other factors. Other 
decisions concerning course content anS level (s) of* ^ 
difficulty, the rate of presentation of material, the . ^ 

* 

optimum duration of instruction-, and the amount of prior 
participant skill and knowledge also have to be made. 
These decisions depend" heavily upon the characteristics of ^ 
i:he engineers to be enrolled. ^ * * " 

In shorty" to be effective, continuing education prqgi^s 
must not only teach fconte^it.and skills needed by ' practicing'* 

engineers, but in doing so nwst adapt to the characteristics 

.... , < 



^nd limitations of the persons needing the instruction.* 
This makes offering- a sound continuing engineering education 
program a difficult task requiring much wisdom artd good 
j'inf.ormation about the needs, characteristics, and .activities 
of these practicing professionals in^their worJ^ roles. 

_ These^^^^ie, fadtor^ also complicate the evaluation of 
^learning outcomes fpr continuing engine_ering 'educatipn 
courses, ^he^deci^ions atSit the level fs) of difficulty' of 
courses, of durationr '^nd mode of delivery, and of ther>gia*jor 
intended 'outcomes^_^all^irectly infljiehce the iaethods and 
procedures used\o -evaluate^^course ef f fectivenfess and the ' 
degree of Aifiivldl^n^t^^ leaning. The design and ' • / 
iraplementatl^'of. the 'in^ructional system iise\^ qjaould not^ 
be sepa:rateaVfrc^Ti^>fefi^ actiirities used to assess 

degree.o^-indiy'idua.l st/udent learning, and judge- overa-ll • 
course Effectiveness.'* This principle was noted long jigo by 
Ralph Tyler (1950)., and has con^st^ntly been observed [to ■ 
be basic to good practice in the ayea of educational c6urse 
and program ev^u^tion, {BellacJc '&' KlieBaVd, 1977)... This 
principle should be clear i^ drder l:hat the reafler be ^ 
disa^bused/of the id^a that 'there is one s"imple* "best" -method 
"or set of procedures by Which to evaluate thp outcomes of ' 
continuing education courses in .engineering". ^ There is 
neitheV one "best" method or procedure fo^h^ teafching . 6r 
evaluation of such course^l.. There are, fio^er, some wel.l- 
es^ablished guidelin.es and' alternative strategies for 



-5-- r 



constructing and evaluating cont;.inu^ng' education courses in 
technical fields with attention^iven to the variables noted 
earlier. 

In large part the present level of expertise in program 
evaluation has grown out of/concerns about the effectiveness 
'of large federally-supported curriculum development • 
^ activities (Grobman, 1968; Worthen & Senders, 1973). These 
and earlier curriculum development activities in public and 
ii higher education, as w4ll as in military training activities, 
provide- a good foundation for approaching the topic of 

evaluation of learning outcomes" in^ continuing engineering 
educatioa.' i;t is the purpose of this book to set forth 
this accuiiUated knowledge in the hope that it will serve 
as a set of useful procedures for engineering educators. 
The procedures offered are also grounded in the activities 
of aii interdisciplinary group of scholars fron engineering, 
higher education, and educational psychology. This group- 
has been engaged in the evaluation of learning outcomes for 
multiple and. diverse continuing education ^courses in the " 
.engineering sciences • ' 

The concepts of both formative and suipmative evaluation 
will be BS.ed throughout this book (Bloom, Hastings, & ^ 
Madaus, 1971; Greenfield, 1978; Grobman, 1968).. It ^hould 
be clear to the reader that, these terras are being applied 

the ^ctiyitiei9 b^T^thl!^ ^rbup'were supjsiQi'tail by 'the -National - 
Science Poundatiorf, Grant No. SED78-22060, The Learning 
Outcomes Measurement Project. 



to the class of courses concerned with the upgrading of 
knowledge and ^kill^ of practicing engineers and related 
technical personnel through a variety of short courses and 
professional training seminars and workshops. The guidelines 
presented aife not intended for the development of formal 
courses for undergraduate or graduate engineering courses 
taught over the cpurse of a semester in typical, college or 
university programs which lead to a degree. However, many 
of £he principles set forth are useful in these situations 
as well. 

The focus is upon alternative methods by which to 
evaluate the learning outcomes for an array of short courses 
designed 'for the professional engineer; It is necessary to 
attend to the purposes, objectives, content, organization ^ 
packaging, delivery, and i^ollow-up activities which are 
usually part of such short courses if they are -to be 
evaluated properly. - Evaluation "'i's an integral part of the 
development and delivery of ai>y such course (Aleamoni, 1980; 
Gagne, 1967; Grobman, 1968). This type of evaluative 
activity is usually' qalled formative evaluation ", the means 
by which courses are imp'roved toward more effectively meeting 
the. needs- of the persons- who enroll in them. 

\ 0 

Summaftive evaluation is also required in order to 
report to .^he individual participant something about the 
degree to vhich he or she has achieved the course objectives, 
what.imore r^eeds to be learned, and how competently the 



, procedures and skrlis learned in the course^may be practiced 
in daily professional activity upon returning to the work 

site. Summativfe evaluation is also needed to describe' the 

) / 

general effectiveness of. such courses in ordfer that 
businesses and agencies who send engineers and^ other 
technical -personnel to participate in training can be 
informed about the general effectiveness of the training. 
It also provides information required for periodic 

adjustments.^d improvements in continuing education courses 

/- 

. and progranfe by the persons who develop and operate them. 
Summatii^ evaluations are descriptive and judgmental 
statements about the intentions, procedures, and^^worbh of - * 
\ — ' courses or programs- takeri as whole janits. They, literally 

provide a summary of course' operation and effectiveness in 
meeting .desired learning outcomes over a given time period- . 
with specific, groups of enrollees. Any summative evaluation 
may also be used in a formative way to make improvements in ' 
. future replications of courses and their .operation within *^ 
program? . 

The remainder of this book is organized into four parts. 
The chapters in the first part describe a typology of * 
continuing education cou^rses. Different types of courses 
serve different purposes and must be evaluated in different 
ways. Another chapter deals with the characteristics and. 
needs of persons typically enrolled in continuing education 
courses. Two additional^ chapters deal with formative and 



sununative evaluation and describe how these activities can 
J be used to help courses and programs to meet the needs of 
the participants who -enroll in them. 

The second part ot (book provides a detailed 
description about ^the= use of four- 'different typefe of tests 
in the measurement of learning outcomes resulting ^from 
instruction. The information and examples provided 
illus-trate how the various types of tests should bemused/ 
when they should be used^ and"^how the tests should be 
developed. 

A third part of the book describes talternative 
procedures for the development of vaiid and reliable tests 
andNmeasuring instruments by which to make inferences about 
the degree of individual learning and overall course 
- . ef f ectVehess'. The developanent of tests and the testing 

activity is presented -aa an integral part of the process of 
instructional development activities which are- required to 
create high quality instruction. The instruct iofial purposes 
ahd uses of tests and^?testing are emphasized. One^chapter 
Is devoted to 'spacif ic metl>ods of test item analysis and 
• test reliability deterinination. Anather chapter is devoted 
to the limitations of _tests and te^ting^^jjid serves to place 
this method of as^^ssment of individuaas' learning and 
evaluation of course effectiveness in a proper perspective. 

The fourth part of the* book is a single^ birt extensive " 
chapter which describes and provides ekami^Ies of how ' ^ 

-9- 



ERJC . ... 30 



assessments of individuals', learning and evaluations of - 
course ef tectiVeness can -and should He reported to various 
persons and groups.^ The needs of in^vidual students, 
yistructors, administrators, client agencies, and 

9 

professional, societies for information about ;thB 
effectiveness of courses in achieving their intended learning 
outcomes are described. Methods of meeting these diverse 
needs for information about the performance of Icour'ses and 
the persons who teach and complete these courses are 
described. The types of information needed, methods for 
gathering the information, and effective ways' of presenting 
the resulting data are all described. The a^ivacy of 
individuals' test scores and other learning assessments 
and the public nature of khe overall .evaluation of course 
and program effectiveness are emphasized. Precautions are 
suggested to prevent; the abuse of test scores. 

■ Means for the evaluation of "cpunses and programs" 
..gather than "persons" are presented in the last cHapt^r. - - 
Courses wh^e testing provides accurate estimates of learning 
are described along with other courses where this is not the 
case. Limited time for testing, p^roblems of adequate' 
sampling of -the performance domain being instructed, and 
the anticipated growth in learning of course content after 
course completion are all presented -as problems which need 
to be recognized and dealt with in the measurement of 
"learning, outcomes resulting 'from instruction. The ' 

\ - s 

-10- 



impossibility of -making "complete" assessments of an 
individual's learning resulting from a --given course is notec^. 
The use of multiple indicators of learning" outcomes, item 
sampling procedures, and comprehensive assessments of 
course effectiveness are recommended rather than only the 
assessment of individual persons' knowledge- and skill by 
common testing procedures. The last chapter pulls together " 
the previous sections of the book and makes strong 
recommendations. It may be useful -to read the last chapter 
first and then proceed to other sections and chapters- of the 
book as.it suits the needs, and interests of the reafder. 

The detail*id table of contents and this' first chapter 

should be of assistance to persons in making the decision 

< 

about where to begin. Although the boo\ is written in a 
sequential manner," it is also intended that persons having 
interest in any particular topic or section will be able to 
readily locate that sect ion\an4 profit from its study 
without having to 'read other sections of the text. 



■ r ■ " 

Chapter 2 

'» . ^ 
CHARACTERISTICS OF CONTINUING ENGINEERING \ 
» ♦ EDUCATION COURSES 

There is a variety of common types of short courses 

* f or engineers. Courses differ" in purposes, content, 
delivery method, packaging, and intended audience. Each of 
these distinguishing characteristics will be examined and 
implications for evaluation of the diffeeent varieties of 

• learning outcomes noted. 

J 

Four Basic Ty pes of Courses 

-~ j : . \ 

Continuing education courses for professional engineers 
may be classified into four broad categories. These/ include 
courses concerned with; C 

a) remeciiation and upgrading bf- basic knowledge and 
skills; 

b) extending and broadening previously learned scientifie 
and technical eoncepts and skills; 

c) imparting new concepts dnd skilli? .in technical areas 
at high levels .of expertise to keep pace with 

^ advancing scientific discovery and technology; 

* r 

d) acquiring new awareness, knowledge, and concepts 
outside the areas of engineering and basic sciences 
in order to .perform in a more effective' manner the 
many other duties required of engineers in the areas, 
of economics, personnel and business management, ' 
ecology and environmental protection, and community 
development, organization, and problem solving. 

Let us now" consider examples of each of the four types 

of courses. Pre<|aent reference will be made to this typology 

of continuing Education courses throughout this book. 



-12- 



V 



Remediation ' \ * ' • " ^ 

The most common courses in this category are short 
courses designed to improve basic concepts and skills. to help ' 

engineers pass state licensing examinations. These courses 

*> 

are popular with* recent graduates- of engineering programs 
,who plan to sit for such examinations. . ' 

Another example includes short courses designed 
to sharpen skills arid concepts once learned, thoughL^iot 
recently used, but now in Remand because of a new process 
being widely adopted. 'Basic electrostatics and bonding 
J concepts in physics and chemistry, which were once learneci 
by m<xst engineers but sihce forgotten, are an example of an 
area in need in a period when, many firms aife developing 
copying processes similar to xeyrography. * 

Extending Prior Knowledge and .Skill 

General cpurses which relate sophisticated concepts and 
prihciples in different afea^ to one another fot purposes of. ' 
integiifiiting. and extending coKqeptfiL leaihied ^earlier ' are 
found frequently in this category. An example might be a J 
course on casting of metal alloys from powdered metal in wet 
environments. This course might involve' information from 
dentistry, industrial engineering research concerning 
fabrication of certain , machine parts, development of new 
materials for repaij? of structural cracks, and low temperature 
castings. Generally, the persons enrolled in the course are 

* -13- ' ■ 



ERIC 



. ?eei4-^.%^^br^^_t|ife^^ r^sea^ch, and 

". / t:W*inoi^0s|tb^£^^^ arraj^f 

applicationXv-- ,gUG^^^^^ ' ' ' . . ' • 

integrating the /pt^ior - k'fi^^ the; particlg^ntis. with * 

recent; deyelopme'njttj^n .r.eU^^ - JPre^entlii, the 

motivation •;ior att^^^ is tO- . increase ohe's 

' general know! edge, .about ^fehe^.topi^a^^^ and Keep abreast of " • 
. recent develo^ert^tsj. ' The ex learning outcome 

anticipated by.- the "participants and proposed by -the course . 
' developers is often not a skill in relation to improved 

on-the-job per|ormance. Rather, it is a more general. know- " 
ledge and aWareness 6,f relationships. ^ 

^ ^ I* is worth, noting that some participants may also 

v • ' » 

enroll in other types of courses for this Sarae^"general 

-jQiowledge improvement" putcome, aven €hougfi the coiiria may ; 
, be targeted to teach Specific skills and concepts very 

related to job performance in specific technical area's. This 
means that the typology of a given course is deiiferrained not ■ 
onl5^by the course objectives afid content, but by 'the 
intentions, expectations, and motivations of course 
participants as well. More will be said about this in later " 
chapters. ■ 



Imparting Advanced Technical Concests and t Skills 

\- Corses on microcircuits and microprocessors to update, 

the engineer on the latest thinking and development Un thiTs 



-14- 



field and to call attention^ to the maniKpotential 
. applications to his or- her area of work, are' one example. 
- of -.thts^type of course. . Typically such courses" are highly 
specialized and focused on specific learning outcomes. 
They often require high levels of expertise and knowledge • 
in prerequisite capabilities. For example an advanced course 
concerning the use of microprocessors', to, aid -indus trial 

production control *processes would "usually 'require . ^ \' . 
■participants to be facile in computer programming, 
electronics, optimal coivtrol^ theory, and the basic - 
mathematical procedures which underlie all three areas-.* The 
participants enrolled in these types of= advanced technical 
courses often are motivated to attend because they expect 
^ to learn very specific methods and procedures directly 
relevant to their ongoing wotk. y ^ ' 

Bxposure to Knowledq e Outsfid^ of Engineerihg Science 

Courses on current environntental. standards for air and " 
water pollution- designed to inform engineers^ of the . 
Allowable limits, methods of analysis,, a»d methods t^ 
determine cost ef f icient- way* t6 reduce levels of .chemicals, 
particles, and other by-products and' wastm^fcei;ials in the 
-environment are examples in ttiis cate^o'ryV Coura6s 
concerned w^th teaching t^e skills of human relations and 
communications to engineers ih order that the village or 
city civil engineer can l«arn^how to mediate' more effectively 



between opposing and df ten ^highly antagonistic individuals 
from business, inetustry, uniorf, consumer, and >^nvironmental 
pj?otection groups are still Another .example. Tide ^need for . 
courses similar to these examples increasingly is being 
recognised. The city, village, or project engineer must 
often mediate among diverse community groups and get them 
*to work 1:ogether in the task of meeting environmental 
standards for air and water purity or plan long term 
community development. Most engineers have had little 
formal training in the skills required for effectively 
assuming this role. Yet, engineers are feequently called 
upon to serve in facilitating, counseling, and group 
leadership roles. Because of this deficit in thair prior 
professional education, ^continuing education needs exist 
for courses in engineering and domn^unity econoipics, in 
communications and leader sh^ skills, • and in conflict^ 
resolution and humati relations skills to better enable 
engineers to be morHi^ef f ectively involved in community 
development activities. 

Different Instructional Purposes and Methods Across 
Course Types 

It should be^ noted that the different' types of courses 
have not only different intended, outcomes, but that the 
means by which to measure these outcomes ar^ also different. 
Courses of the first type, concerned with remediation of - 



basic knowledge and skill for purposes of scoring well on 
a professional- licensing, exam, are similar to typical college 
courses. For the course concerned with basic knowledge and ' 
skills the most approjjriate outc9me measure is the success ' 
of the course participants in passing the professional 
licensing examination. The most appropriate measures for 
assessing performance, before, during, or after the course 
are test items sampled from the content domains on which the 
persons will be tested on the professional li/censing 
examination. 

One of the most appropriate learning activities for 
courses directed toward remediation and upgrading of basic 
knowledge anjj skill is the completion of many practice 
problems sampled from the broad domain on which the student 
will be tested. These characteristics also have implications 
concerning the 'most effective duration, organization, and 
delivery of the course. 

s 

With the basic remediation course, distributed practice 
with many sessions interspeirsed with homework, ' t lie cofrection 
of homework problems and frequent quizzes is the most 
reasonable approach. These assessment procedures serve to 
inform the learner and the, instructor of an individual's 
progress. Areas needing additional study by individuals and 
group instruction by the' teacher are clearly identified ' 
during the course of instruction. A different approach to 
teaching and assessment oi learning is required for courses 



in the other areas, such as_^hort duration, intensive 
seminars or workshops. These. other types of courses may be 
very effective, particularly if the participants have the 
necessary prior knowledge, skill?, and an active need for the 
new information, but they shoald not be planned, taught, or 
evaluated in the same way.' " 

The third' type of course,' concerned with imparting ' 
specialized hi^h levels of technical knowledge and skill to 
a person already highly skilled in an area^, is probably the 
type of course for^whicJi prerequisite knowledge and skill 
levels are most important; 'wiVhout thd appropriate jevel 
of entry skills for such a course, little learning. can 
occur because. participantT'will be unable'to understand and . 
perfprm the learning activities, comprising tjie course. 
Therefore, in these courses, some sort of screening by a 
performance-task or test, 'and adequatfe'liisUng of " ' 
prerequisite skUls for participants before registration is 
very iiliportant . This .<;an often be- accomplished by clearly 
stating the^oufse prerequisite skills and' knowledge 
requirements when advertising the course. Pre-tests- may 
also.^be used to determine- the.entry level skill and knowledge " 
o« participants, and to advise persons of their readiness ^ . 
for the course: Pre-tests also serv^to communicate. ta 
prospective p^articpants upeful information about specific 
coijrse content. . ' . • . * ' 



Assessing -Learning Outconies Across Course Types 

What can be accepted as evidence of achievement of' the' 
intended learning outcomes for ;these different types cyf 
courses varies Woss the four categories. In most cases, - 
measures' which sample. the individual's performance on the 
job or on tasks similar tol thostt encountered on the job are 
the best indicators of competence. Since.it is not possibl 
to include such real performance learrfing opportunities and 
assessment tasks within courses in full measure, it is 
necessary to design learning activities and testing * • 
situations which sample some of the key parts of the ■ 
performance required on tlje job. , A. good performance task \. 
for an advanced technical cour^^on mic;roprocessors .cduld 
include the assembly o'f electronic -circui-ts which would 
perform an information proc^es sing -task. Th« validation of 
the performance of the circuit" and, its. proper ijiterf aciag - 
with laboratory equipment might bp another type of .. . ' 
performance task by which^^aapaes .the,' d^^r^ 6^ lo^irping! 

The proper operation ^ the c ire uft and 'tfre' student* g •. 1. 

^ 0^ . <'•.,■ -• 

ability to validate the circuit in 4 laboratory ataUtion 
is a rigorous teat of the student's performance. No other - 
test is. needed. These typespf asses6raen,t tasks can be 
built into the instruction of the short course, and', indeed, 
are most appropriately administered in this context. ^ 
^In contrast, in a course designed to foster increased • 
skills iri human relations and communications, it may not 

-19- " . ■ . 



be possible to so thoroughly assess the acqqired learning 
outcomes. Often the performance measure most appropriate 
in sucht -situations is the presentation of a series of filme^.' 
role pUys or verbally presented episodes. These can 'be 

* » 

followed by paper and, pencil tests requiring, fehe analysis ' 
bf .the roles and characteristics played by various persons 
lA the. episodes. Based upon course principles participants 
can be askeS to develop «. plan or course of action for some 
real-life sutuation involving the meeting of disparate 
.groups he or she is expecting to work with in the future. 
-An^additional requirement .might be that the" plan be developed 
as a product the course instructor can evaluate. 

For both b^. microprocessors arid human relations skills 
courses, the best performance measure is follow-up 
observation of the .degree to which the co^rse particpants 
put the new knowledge and skill to use and the competence 
they exhibit in doing 'so. However, such information, biyond 
self-report or supervisor ratings, is difficult to obtain. 
•For this reason, test tasks. and other assesament methods 
used within the confines of a course, sent to participants 
as a follow-up activity and returned for scoring by the 
course -instructors, are often more practiaal and can b^ very • " 
helpful in assessing the quality of the c6urse and the amount ' 
of Student retention «nd learning. ' . . # 



-20- 



utility of Learning Outcome Assessments 

It should be apparent that the information obtained 
from the -assessment of course learning outcomes , is valuable 
to several groups. First, it is of great value to the 
^ persons who. develop, teahh, and redesign the courses them- 
.selyes. 'Any course can be improved, if systematic information 
about how well it rea&hes' selected learning outcomes is 
attenJed to. It is often necessary to ask whj: these outcomes 
have been realized or not realized, and not only how well. 
Answering the wh^ question often c*lls for good descriptions 
of, how the course operated, the attitudes and' behavior of 
the participants, and instructors, the appropriateness of % 
the content, and the physical features of the situation in 
which the course was instructed. * For example, gven the best_. 
designed course may not work well if offered by a belligerent 
instructor,- late in the evening, in a poqrly ventilated and 
illuminated room. Such conditions may cause participants 
to Jose interest, become hostile, hot pay attention, ir'drop 
out of the course. Consequently, it is important -to 
systemat:.ically seek information about how courses are 
preyed and managed as well^ ds about the learning outcomes 
of participants on tests and other performance" tasks • 
(Worthen & Sanders,'' 1973):' Questionnaires, interviews, 

useful toward-this end. - > 



Another group with strong interest in learning outccSme^ 
of short courses and related training activities is the 
participants themselves. Contrary to popular opinion, 
participants' do not mind taking tests and being assessed by 
other performance measures (Ferry, 1979; Moss, Barfield, & 
Blythe, 1978). HoWever, the test's or procedures must be 
reasonable in- length and difficuiCy, directly related "to the 
area of. instruction, and the results useful to the learner 
in self-assessment of present levels of skill- or knowledge. 
The results of assessments of participants' learning should 
always be promptly reported to the individual. It should 
also be done privately.- An individual's performance 
assessment should not be m^db as.c^ public announcement 
(Wolf, 1974). Later in this book methods and"means of 
efficient, y^porting of learning outcomes of participants will 
be described and illustrated. 

Employers are another ^rbup of persons with legitimate 
interests in the results of learning- outcomes of worker* who 
have participated in continuing engineering education courses 
especially since the emplpyer often provides release time 

♦ 

from work and payment of tuition costs for the participant. 
Employers have a -right to know the past general success rate 
of particular courses with groups of participants similar 

interests in the progress of ' individual employees. 



-22- . ■ ■■ , 



4» 



Professional societies and the administrative officios- 
Wh9 oversee c6ntiixuing education programs have similar ' 
needs. • . 

r 

Limitations of Typ ica l Learning Assessment Procedures 

Caution must be observed in\ reporting results of ' 
learning assessments :and tests of individual participants as 
the only measure of or the definition of learning which has 
occurred. Typical .assessment procedures or tests always ' 
test less than the functional domain of skill and knowledge. 
Very important learni-ng .outcomes are often not measured ~ 
by any given test or assessment procedure; For ex 



e, a 



:^mpl 

person may complete. a- course on the cons^i^uc'tion of • ' 
sedimentation basins and^stream channels for pontrol of water 
runoff on surface minfed lands. Perhaps this person- scores 
very poorly, on the pos.t test for the course, indicating he' 
has learned little. ' However, the person may be a former 
land surveyor whose present job. in state government requires- 
him tb^have considerable s)^il and knowledge in this new 
area, perhaps because .he is an irispectbr. "now sjappose this 
individual is very worried about', his iack of knowledge in 
this. specialty: Perhaps he decides to study the course 
materials on his own after the course. Because of the oourse " 

.^^_the-r^sa3;t-3-^-therpo9t test^TTr cah-^i5wTd~i^rrfy~whar 

basic .knowledge, computational algorithms, and procedures 

he neeas to understand. Perhaps, also because of the course. 



-23- 



the individual is able to identify another prof ess iojial" in 
attendance at the course to whom he c^ turn for assistance. 

- Suppose the participant seeks the assistance of. this 
colleague in future assignments and that , ^ additionally , he 
reports his own assessment of his learning needs to his 
supervisor and requests that hi^ be allowed to attend the 
course again. If he follows this plan of action^ he has, 

'indeed, exhibited some very major and important learning 
outcomes, despite his low post test score. Certainly his 
bdhavior is a likely indication of eventual improved 
performance in his job. In any event, the .individual has 
achieved an important learning outcome, a more 'informed 
knowledge of his present limitations and how to correct them. 

• One can Jilso anticipate the* reverse situation in which 
a person scored high •on tt\e formal assessment procedure or -> 
test for t;he course, but really remained functionally unable 
or unwilling to put into daily practice what she or he had 
learned • ' ^ 

The point of all of this is iiraple. Test scores and 
other short and artificial assessment procedures are only 
tentative indications of what a person -knows and has learned. 
Abtual performance and change in performance and work 
activities dn the job are much better indicators of the value 
at . the course with^raspect ta learning outcomes ^or a -given^ — 
course. There is a tendency for professional engineering 
societies* and academics concerned with doctimentation of 

\ -24- > 



amount of learning resulting from continuing education 
courses and CEUs to ignore these facts. 



Conclusion 



"■>' — 



. in suiranary, many engineers, engineering educators, and 
some members of professional engineering Societies expound 
the view that there ought to be a straight forw/ird way "by 
which to measure the learning outcomes of courses. The view- 
point is-^conmtonly stated as an expectatipn that it is IcJgical 
and possible to-assess learning in such courses in terms of a 
given test with a- test score or some other numerical 
performance score. While it is certainly possible to design 
good tests'which measure aspects of, learning in very reliable 
way», and while it is possible to devise very good functional 
performance tests, it t^ips very difficult and costly to 
make a thorough asyssinent -»^he learning outcomes of a 
particular course -for a particular engineer. The, common 
and naive wish for a simple 10 minute pre-test and another 
10 wtnutt j>09t test by which to ascertain the precise . 
amo\mt of learning df an individual continuing engineering 
education student in a given 'course is- not realistic. It. is 
not that such tests cannot be developed to be highly' reliable 
and va4.id to a specific outcome. They can be quite easily. 
Rather, it i& that the ^inciusiveness of the test is almost 
a^lwaysnnach-leire-trhaii -the-lntended-aiid^-imp^^ . 
of the^short; course. ^Th^r^fore, when one ases such a test, 
on.e knows pnly that the participants know so much or so little 



-25t ^ 



ERIC 



about that specific aspect of the course and its content. It 
is not appropriate to generalize from that one j;est score to 
matters of: a) .whether or not the person will use the 
knowledge gained through the. course . in actual -practice; b) . . 
whether he or she can actually use the kli«wledge or skill in 
varied and real work related, tasks; and c) whether or not the 
person has learned anything of" lasting significance ?imply 
because "this particular test score is- high'or low." 

RecogniWijg theSe limitations should not cau^e- the 
reader to despair. There are many methods by' which reasonable 
estimates may be made about the effectiveness' of spjeci^io 
afaurses in achievi^ng their intended, and sometimes "unintended, 
learning outcomes. Later chapters in this book det5^i;£ wany. 
Of these methods'^ - i 



Ch^ter 3 



CHARACTERISTICS OP COURSE PARTICIPANTS 

• The participants enrolled in- continuing engineering, 
edufcation courses are distinctly different from persons- 
enrolled iiilndergraduate and evfen graduate courses* in 
engineering degree- programs at colleges- and universities . 

' f . ' ' . * 

This is an important point for it reminds iis that the 
persons who voluntarily attend short courses, on their own 
or as representatives of their companies or 'agencies, do" so • 
for reasons different fro;., those of persons, enrolled in 
courses within degree programs. Cpnsequently, instruction.' 
should be diff^reat W this population.' 

# 

Foctts on gracti cal * Needs ' . .... - 

On§ difference is that the short* cdurs^ participants 

are already practicing engineers engineering ^chnologists, 

1« most case's. The usual conceroa that faculty members 

exhibit about the quality of the • engineering s'tudent and his 

or, her qualif itjations as a practitioner, are moot points with [ 
'"' ^ ' ' ' ^ ' • ^ " '' - ' " 

the short course enrollee. By and large these j^ersons are'. 

- • • . w . . ' 

already practicing engineer j^g, qualified or unqualified. 

» , 0 / ' 

They are qualif ied^ by virtue of. holding the position for. 
Which they earn a living doing engineering within sofiie 
agency, or business. The concern of- the short course^^ 



instructor should be much aore on the functional, efficient. 



-27- 



safe, and wise performance- of the engineer in some special 
a^ea fo.r which the course is specifically designed. 

^ A second di^rence between short courses o'f this type 
and more tratiitioital- coliege "and university courses is that 
the former are more focused on specific skills, concepts, 
and^ procedures, while tradiSonal courses are often much more 
diverse, general, and theoretical. In traditional courses' 
m engineering and basic sciences; professional engineering 
societies and academics decide, what knowledge and skill 'is 

basic to practicing engineering. This core becomes the 

V 

content of the curriculum. In^ short courses for, practicing 
engineers, academic standards, \ basic theory, and curriculum ' 
objectives of traditional academic courses ar^ all secondary 
to the functional needs of the paracticing engineer who is 
often returning to learn something specific about some area- * 
of pflirEormance which ;ieeds tb 'be improved ox soijie to^i^c of" 
fipeciai. interest. \ « • 

, M4ch*9l Scrive©, (1977), an expert in educatioiial 
program evaluation, makes a very nice distinction between 
objectives and needfi^. He notes that when teachers- and . 
Other operators of social prograiji^ have a captive audience, 
they talk about >eeting the objectives of the program"." 
These^are obrjectives of the person of persons who design, 

)p.,^Tan4-operafe€-%he-pr0graiiv-.— -floweverr-when— the — -.— 



-28- 




.4 



'audience is not a, captive one, when persona- ^re spending 
their own money in search of new^ideaS, skills, or things, 
the focus is almost never on "objectives'* "but on "needs" 
This distinction is the basic difference between traditional 
undergraduate and graduate engineering . courses in degree 
programs and short courses of*-- a continuing education nature 
*for professional engineers. 

The persons who come to. such courses care ver^pLittle 
about the objectives of the instructor or the university. 
They care a great deal about meeting their personal needs 
related tp performing better^ ^n their work. The objectives 
of the course are useful ^o these engir^eers anly insofar as 
they communicate to the indiVridual participant the nature 

' — ^ " ^ \^ . , ' ' 

of the course '^nd what mik^ be- learned from participafetdiD 
m the co\irsfe.? Objectives can also provide information 
about prerequ^^^Uevels of aki;i prior to entering the 
course, infoxraatioA useful to the person in assessing his ' 
6r Tier own readiness -to entet^ the. coui^ and prof it**f rom 

- the experience. • 

Likewise, evaluaticftis of the typical learning outcomes 

^rom such courses ^or earlier grojups- are of interest to 
participants and em^i^kvers. These past evaluations provide 

.information about. tH^^ticipated utility of the course^prior 

to spending one's own money anS tirott. in f:orapleting. it.. 



^Comments made in an address to the st«»den'ts and faculty of 
the University of Kentucky, Graduate School, by Michael Scriven 
March 12, 1979. 



Evaluations of the general effectiveness of such courses 
are something akin to "fairness in labeling" in medication, 
consumer products, and other areas. A company or an . 
individual has a right to know the general effectiveness of 
a course in terns of 'improved knowled^^' skill , and 
performance on the job. In fact, it is just such types of 
evaluation, usually accomplished ^ informally by questioning 
and by obeervation by co-workers and supervisors of persons 
who have completed specific short courses, which are the 
b^is for future course enrollment and success. 

In short, if the course is well d^esigned to meiet the 
needs pf the participants in parelcular areas of focus, and 
if ther.e are not readily .available and efficient ways to 
meet this need other than the short course, then the. course 
is likely to be very su^cessfui^ Its effectiveness will be 
recognized by the persons functionally engaged in this area , 
of practice. The course subsequently will be praised by 
word of mouth in professional and corporate' circles, and 
consequently become heavily enrolled. It is good that this 
practical form of tacit evaluation exists and it ought to 
be encouraged. 

This distinction between persons enrolling in continuing 
education short courses and more traditional degree program 
courses also has implications for how students ^shoujd^be 
-ittaferuefeed-. — Before~iieta±i-ing-^tSiesa dxtterences in 
instructional methodology and style, it will 'be helpful to 

-30- ' ^ 



ERIC 



review inforrjation about the typical characteristics of 
^^sons enrolled in engineering shortf courses. 

Professional Engineers as Students 

Adult learners who participate, in continuing engineering 
education programs differ in* a number of respects from 
undergraduate and graduate studefPts. These differences have 
implications for both administrators and instructors of 
such programs. A survey of some 257 continuing engineering 
education participa^nts at courses offered by the University 
of Wisconsin pointed out somfe of the major differences 
(Klus & Jones, 1975). This survey revealed that continuing 
education participants are usually: a), older, having a 
median* age of 30-10 years; 3d) practicing engineers rather 

" than full-time students | and c) seeking to upgrade previous 
knowledge and skills for direct and immediate, rather than 
deferred, application to their jobs. In addition to 
participation in formal college or university course work, 
fifty- four percent of the prac^t^ing e^ngineers in the survey 
reported strong interest in various types of continuing 
educat4on programs and activities (See Table 1). The 

'completion of in-house educational programs was ranked ' 
Highest^ in terms of a preferred mode for continuing 

* 

education. The completion of short ^courses and similar 
_ cpji t inuing _educa t io n _^ac^t_iv itjLjss^ s^oiiSPr-e^l, by pro f. e s.s iona 1 
societies and governmental agencies was also rated highly. 



study through formal college* credit courses was least 
valued as a mocje of continuing education. 

Table 1 

) 

Engine*^s* Interest in 
Non-formal Education Programs 
' (N = 257) A 



Type of Educational Program 



Percentage Persons 
Expressing a Preference for 
Thi3 Type of Educational 



. Reading current engineering and 
technidal literature * ^ 

Completing^aninimum 1 credit 
course 



I Completing in-hourse 
educational programs 



Completing professional, 
society, & government 
sponsored programs 



54% 

K 

18%' 



74% 



55% 



Source: Klus, J. P. & Jones,. J. A. Engineers involved in 
education; A survey analysis . Washington, D.C.: American 
Society for Engineering Education, 1975. 



Because these course participants are usually working 
full-time in their prpfession, the time which they Kave 
available for instruction ^nd study is more limited than 
their grjaduate and undergraduate counterparts. Therefore, 
instructors are forced develop and use time-6f f icient 
.methods/ in their course offerings fori^ontin^ing education. 
"pair,tici"pants. / ' • ^ • 



-32- 



While many undergraduate' and graduate students in 
formal coll,ege and uniyers,ity programs' would like more 
instruction in practical matters, thfe typical graduate 
engineers enrolled in a continuing education course 'often 
demand learning experiences with immediate a[pj>2^ication* In 
the case*of more general courses designed to integrate ' 
concepts and principles rather than focus on particular 
skills, participants demand a sharp focus on specific topics 
and time efficient instruction. Rarely isimpre than a 
few days available fpr the presentation of the material. 
Employers have similar concerns. 

A recent survey of graduate engineers^ indica|:ed that 
'65 percent nof the participants in a gooup of continuing 
education courses in engineering at the University o# 



lientucky sa 




important consideration influencing their 



attendance was that, their expenses were paid by thein 
employers (Mertens, 1979). Employers aire very concerned 



with the direct relevance of courses 



the daily work 



aci:ivities of their employees. This fact is widely known 
to persons operating continuing education courses and 
interacting, regularly with employers of engineer's. Given 
this concern and^ the fact that many engineers do wish to 
attend short courses and other forms of contiimi«g education 



*8f 



actiyities, it is important to recognize the strong need 
for applied and practical courses. • 

-33- > • 



•1^ 



ERIC 



^ ? 



*Since inost university-hjased continuing engineering 
'education programs are taught by professors who also' teach ' 
at the graduate apd^ undergraduate levels> it become's 
necessary for these f.nstructors to "shift gears" in order 
to provide .the learning outcomes desired participants 
in continuing- education cour&es. • Faculty development and 
inservice education acti\^ities may be beneficial to staff 
from engineering colleges, who usually teach regular courses 
in formal degree programs: These professors often can leaiyi 
much about the needs of graduate engineers in continuing 
education courses, the characteristics of these learners 
.which distinguish them from younger undergraduate students, 
^ and means to deV^op and present effective and efficient^ 
short courses to meet these needs* The teaching metheds^and 
organizatioh of instructibn in short courses need to be 
moBe- focused ,^'skillfiilly articulated and executed than 
in courses wher-e-more time is available, - The cour^ 
content also needs to be more sharply delineated^ Professors 
accJustomed to teaching more traditional colljsge courses 
often have much to learn in these and related afbas before 
^€he^can become effectitre developers and teachers of 
continaing education courses. A, comprehensive listing . and 
description of th'e specific skills and competencies needed 
by continuing* edlication faculty is provided by McCullough 
(1980). also provides a questionnaire for assessing 

faculty competehce in each major aoea of performance. The 

-34- 



ERIC 



questionnaire and the perf or:imnce .categories upon which it 

is based arq useful in deterrtiining what specific skills need 

to be developed in persons who are assigned the task of 
•it . • 

developing, operating, and evaluating continuing education 
coufseik.^or engineers and other technicai personnel. 

The preparation of faculty 'for this "new" role. can be 
facilitated by the involvement of experts from other 



disciplines such as adult education, educational psychology, 
and instructional design. Any ins^rvice education of 
facxilty,in the d^fesign and operation of continuing education 
courses also ought to include direct experience with well 
designer! continuing education courses and the persons who 
develop .and regularly teach these courses. The most exemplary 
, teachers and courses may sometimes be from outside the 
academic community of majclr univetsities and colleges. • 
Inservice education'^ efforts are f re^juently needed to assist 
engineering faculty to meet the challenges posed by a 

different/ more adult, experienced, and practially oriented 

ft 

clientele .encountered in ^hort courses. The f9.«?us on 

' .' - ' 

teaching very specific topics and skills within a very 
-restricted time JErame also demands major ad justments . in 
teaching style. 

'A. 

Adult Learners : Andragogy versus Redagogy 

-ftln recent y,ears adult educators have differentiated the 
assumptions concerning adult learners from those' , 

. \ 

-35- ' 



traditionally funked to children. One such educator, 
Malcolm Knowles (1970)., has coined the term "andragogy" to 
describe the art and science of helping adults learn, as 
contrasted with "pedagogy", the art and science of teaching 
children. Knowles' concept is" based upon four assumptions 
concerning changes which occur when a person matures. 
These include': a) the movement of self-concept from 
dppendency toward self-direction; b> the accumulation of 
experience which beco;nes a valuable learning reservoir; 

c) the orientation of the individual's learning readiness to 
the developmental tasks of his or her social roles; and 

d) the shifting of time perspective from postponed to 
immediate ap^plication of learning. As learners becojne adult 
their orientation increasingly turns from one of subject- 
centeredness to problem-centeredness (Knowles, 19^70). 

The acceptance of these assumptions concerning adult 

(5-' ^ 

education has implications for administrators and . 
^instructors active in continuing engineering education. 

Rol^s of Adult Learners in the Learning Proc ess 

If adult learners are self-directed, in varying degrees 
it follows that they need to become involved .in the total 
learning' process as much as possible. This includes; 
a> assisting in planning the learning objectives and. 
activities; b) actively engaging in the learning experience 
itself; and c) evaluating the learning experience in terms 
of its outcomes and their worth. 



The persons who .3evelop and operate continuing education 
courses for engineers should routinely involve samples of 
actual persons employed in the target area for their courses. 
These persons, as welj^ as thdir employers, should be. asker ) 
to make judgments and. comments about the proposed objectives, 
. content. Organization, and teaching location for courses which 
are uhder development.. Sometimes surveys of the educational 
needs of ihe population of engineers in a region should be 
undertaken before courses are developed. This type of 
activity is commonly referred to as conducting a "needs 
assessment".. Gathering such information abo.ut needs and 
exposing proposed course objectives, materials, and ' 
procedures to prospective enrollees can serve two purposes. 
Fir^t, it can .improve the course which' is finally developed. 
Second, it can alert professional engineers and- their 
^Ployers that there is a local group of engineering educators 
who are competent in many areas and genuinely interested in 
the needs of working professionals. These activities improve 
courses and instruction and also ^bmild rappott with the 
profesaional engineering community, which otherwise may be 
unaware of the local , expertise and resources available to 
them. These experienced adults who make their daily living 
by doing engineering activities are very qualified and 
capable of planning what they need to ,learn and how to . 
accpmplish it. '3 ^ 

Adult learners should be directly involved in the 
teachirig-rlearning activities with' each other and with the 



i^xd4;ructors in ongoing courses. Course participants will 
fnecjuently have example applications, problems, and prior 
experiences which can b6 shared and can amplify the points 
made by course activities and the instructors. It is 
important tha% instructors raspect the maturity and experience 
of participants dnd, recognize the vaTue of seeking out 
and using titeir contributions, criticisms, observations, and 
ideas. ^ Arrogant instructors who feel that' participants are 
often incompetent in basic areas and must be told what to do 
arid must always bow to the expertise of the instructor, will 
usually run into much difficulty with such groups. The ' 
morie appropriate relationship is one of a tutor, expert 
in some areas, who seeks to share sone of this expertise 
wi^h fellow professionals who have actively committed the 
time and energy to pome together and study, question, and 
leann about an area of interest to them. Some college and 
university instructors accustomed to playing the nuthoritative 
professor role in traditional courses have difficulty adopting 
the partnership role necessary for continuing education courses 

\ ' ' . . ~ ' 

Roles of Adult Learners in Evaluation of Courses 

The adult learners enrolled in continuing education 
courses are also a very important source of judgment about 

.if 

the. effectiveness of the courses in achieving certain out- 
comes, some expected and others unexpected. This group 
also can judge thie worth of courses, and parts of courses, 

-38- 




in proTOtijig^worthy outcomes. Worth is .uaually defined as 



learning something useful which somehow facilitates one'o 
work'^activ^y or some aspect of this activity. , Consequently, 
the judgments of course par.ticipants, and their employers 
and suptirvisors, about the worth of courses is important 
information v^ich should be routinely sought out, collected, 
and processea^y continuing education program operators. 
Formal testing and performance assessment procedures, as 
well as interviews and questionnaires, also can provide • 
information about the worth of courses and lead to t;hBir 
improvement. 

/ Instructors and a^m^-nistrators charged with the.^ 
responsibility of 'continuing engineering education have 
traditionally shied away from 'formal evaluation techniques. , 
Such persons often feel that adult learners are reluctant 



to be judged by their peers and, thus, W9iild perhaps not 
attend programs providing this type of evaluation of 
individual student learning qjatcomes. Recent studies have 
^^shown that this is generally a false assumption (Ferry, 1979; 
M05S et al., l'978J . Interviews conducted^ by the Learning - 
Measurement Project and earlier^ studies by the Univ.ersity 
of Kentucky • s .College of Engineering, reveal that 
participants are willing to 'have objective evaluations made 
of their learcflfig. This is especially so if the persons 
who ta^ch the ^courses view the learners as adults and^^ 
convey the genuine desire tb -evaluate the coarse afid progr<tni 

' -39- 

* ^ \ • , » 

' " CO . . .-. • 



and not only the individual learner. Students who recoonize 
that the results of formal testing and other performance 
-assessment procedures will be used to provide evaluations of 
course effectiveness and lead to improved course organization 
'I will generally be quite willing ?o be assessed. This is 

even more .-true if the students are aware that the tests and 
^^assessment procedures have been carefully constructed and 
are reasonable estimates of their knowledge and skill in 
specific aspects of the course. 

While program administrators must be aware that 
particpants -may be reluctant to take tests, particularly if 
•they are poorly designed nnd' improperly used, they Should 
also be aware of the growing demand from, many sources for 
valid vertification of learrting outcomes. These sources 
include employers, who often pay all or part of course fees; 
universities; professional societies; 'and accrediting 
agencies. Although additional research is needed. in 
this area, studies completed to date appear to show that 
engineers attending. continuing education courses dp not 
resist objective measure of performance, particularity if 
these measures help them assess their own level of learnihg, 
and contribute to course improvament . The key question 
should always be, "How effective. is this cQwrse?" After 
that question is answered it is appropriate to ask', "aoW*much 
did this , particular participah t learn about specific .intended 
outcomes and how can his or her learning be accurately 
estimated?" ' * • 



Adequate evaluation, it irtust be recogni-zed, can be 
costly, both in time and money. However , -objective 
evaluation of- learning outcomes can serve several purposes. 
These include: a) meeting the demands for accountability 
o£ outside agencies; ^b) enabling continuing education 
administrators to more effectively meet the needs of their ' 
clientele; c) providing ciourse instructors with objective ' 
data to determine how much learning has resulted f rom > 
instruction; d) providing a method^for evaluation of 
effectiveness of course instructors; and e) demonstrating to 
course particfipants and employers objective methods for 
de;termining course learniilg outcomes. 

Motivations for Attending Continuing Education Courses 

The engineer *d motivation for participating in continuing 
education hag implications for both the advertising of 
programs and the instructional design of courses within 
continuing education programs* Information about 
participants' motives* for atteindance can be used to direct 
the^'content Of *the promotional materials to potential 
enroxaees and their employers ^ thus resulting in attracting 
larger numbers of interested students. - In addition, course ^k-^ 
content and instructional techniques can be designed to 
enhance student motivation and ach'^ievement (Cole/ 1980) . 

The general motivation of adults for participating in 
continuing education has been studied by a number of 



researchers (Monstain & Smart, 1974).' However, literature 
specifically about the engineer's motivation is scarce, 
Wiesehugel (1978), using a taxonomy proposed by Miller (|^77) , 



studied thQ motivational factors in a group of professional 
engineers. Wiesehugel found that the most popular ^-easons 
for attending continuing education courses were payoff from^ 
previous study, acknowledgement of a changing knowledge 
base and the need to remain abreast, absence of accepted 
certification in one's ^leld, and. upward aspirations, ^ 
Wiesehugel 's results are of interest because his sample 
consisted of protessional engineers. However, his st^dy 
was not as conceptually or methodologically ^sound ^rs 

would have been desirable.* 

* »- « - 

Based on the work of Wi&sehugel and Monstain and Smart, 

an exploratory study was/ cond.uctdd by Mertens (1979).' This 

study was part of the Learning Outcomes Measurement Project 

conducted at the University of Kentucky. The nature of 

the stttj^y and a summary of the results are presented below* 

Although the result.%^re based upon only one course, which 

«. 

falls into the fourth category of course typologies, the 

* ■» * • 

results are informative. / - * 

The Mertens (1979) study was based' on responses of 179 
peifessionals enrolled in a television-presented course in 
economic analysis for engineers. The course was sponsored 
by the Appalachian Education Satellite Prograppr^ahd the 

-42- 



participants were located at^ 25 sites' in Appalachia. The , 
particpants represented primarily civil engineers (25%), 
followed by "other" (22%), mechanical engineers (^9%)/' 
electrical engineers (15%) , ,and chemical, engineers « (2%) 
The "other" group included non^engineers , engineering \ 
assxs.tants and technologists, and planners. 

The participants completed a pre-course survey whi6h 
included questions about demographic information, sources of 
tuition fees, and whether employers had recommended 
attendance. In addition^ 19 items believed to be related to 
participants' motivation for attending the course were listed 
Participants wer6 asked to rate the importance ^ of each item 
for determining their decision to enroll in the course. A 
five -point Likert scale was used to rate each item. 

The results of the participants' response to the 
jaotivational factors are presented in Table 2. The 
.mean » ratings of -the 179 participants are presented in rank 
order of importance. The highest rated items pertained to 
professional advancement, e.g., "To learn new ideas that- ^ 
might enhance my job performance", and "To acquire -specific 
knowledge of a field or subject." The lowest rated items ^ 
were related to external expectations or influences, e.g., 
"I; want the qertificate that is awarded at the end of the 
couj^ae," and "My agenqy/supervispr strongly recommended . 
that^ I attend." It is also interestiing to note that the 



Table: 2 



Rank Ordering of iMotivational Factors for 
Participating in Continuing Education* 



3; 



Motivational Flgictbr 




s.d. 



1. To learn new ideas that might enhance 
my job performance. 

2. Interest in the subject. 

3. } To acquire specific knowledge of a 

field or subject. 

To learn some of the neWer techniques 
of economic analysis. 

To acquire a new set of skills. ^ 

The opportunity for professional 
advancement. 



The opportunity for intellectual 
stimulation. ' 

The location was within commuting 
distance* 

To do economic analysis. 

To refresh my skills in an already 
familiar area. 

Have taken continuing education courses 
prior to this one ancLfound them to be 
of value. f 5 



7,^ 

8. 

9. 
10. 

11. 



12. To meet with my colleagues and exchange 
Hdeas with them. 

13. My boss wants me to go to schooL 

I'*. Taking tbij? course may help jm%,^aintain 
my presei^t situation. 

15. My expenses were paid. 

16. ^y agency /supervisor strongly 
recommended that I attend. 

17. Need more education. 

18. Want ttfe certificate that is awarded 
at the enjd of the course. \ 

19. Kno.w other engineers who are better off 
because they took a course in 

^ engineering economy. 



/ 1.48 

1.80 
1.84 

1.97 

c 

2.21 

2.25 
2.39 

2.64 

2. SI . 

3.02 
3. 19 

3.26 
3.28 

3.44 
3.44 

3.49 
3.62 



c 



*Rating Scale: 1 - very important 
. ' ,2 - moderately important 

3 - neutral 

4 - moderately unimpo^,ta^nt 

5 - y/Bty unimportant -44^ 



.66 
.62 . 



.72 

' .89 
.81 

.96 

f 1. 16 

1.22 
1.11 

1.27 

1.19 

1.09 
1.36 

1.33 
1.45 

. .1.34 
1.28 

1.32 
1. 12 . 



C5 



standard deviations of the highly rated items are uniformly 
small, vrhiLe-v|:he standard deviations of the lowly rated items 
are much greater. This indicates , that there was much common 
agreement among participants about the highly ranked it^s 
as motivational factors for attending the course and less 
agreement abott the other factors ranked l(;iyly. 

More information is required in order to correctly 
interpret the ratings of several items. For example, the 
participants' response to "My expenses were paid," is 
meaningful only if the particpants' expenses were indeed 
paid. 'When asked who paid for their attendance, 63 percent 
responded that their • employers paid for them. The mean 
rating of this 63. percent of the participants for the. item 
(15)', "My expenses were paid", was 2.89. Thi^ contrasts 
with the overall mean rating of 3.28. Thus, those whose 
expenses were paid rate this factor as having somewhat 
more importance. * 

This situation also applies to the item, "My agency/ 
supervisor strongly recommended th'at I attend." Forty-five 
percent of the participants indicated that their employer 
recommended attendance. The mean rating for this group for 
the above item (16)* was 2.84, as contrasted with th^ whole 
group mean rating of 3.44.. Once again, the gr6up for whom 
the item was most relevant rated the it^m as slightly- more 
important. 



4 

. Mertens' (1979) results suqges-t two conclusions:' First 

engineers, enrolled in 'this course tend to rate the items 

concerning acquisition of specific knowledge and skills an'd 

professional advancement as* most important ""f or influencing 

their participation in continuing education. These a^'^ 

predominently self directed or intrinsi<* motives. Secondly, 

external influences appear to be less important for 

determining participation. However, the rating of these 

external factors is directly influeaced by the relevance o£ 

the item for the individual participant. This suggests that 

the course attracted a variety of 'different/persons for 

different reasons. This is probably a very common situation 

for most courses in continuing education areas. Yejt, as 

established' by the rank ordering of the items based on 

participant responses, it is clear that the self-directed 
. \ . ..... 

leairning^motive? basic* to. andragagy theory are predominant. 

Conclusion \ * . 

Earlier sections of this chapter have called attention 
to the concern of many engineers for continuing education* 
courses gf a practical nature. The^ Mertens study as well * as 
common experience suggest this i's a strong t^oncern.- However, 
this point v/should not be over emphasized. A survey of, 
engineers by Morris, Sherrili, and gcriven {19-78) indipat^d 
that 4Cr perdrent.of the respondents would establish policies 
to suppdrt^-iSi continuing education program in which v^p to 

: -46- 



50 percent of the programs ""had nbn- job 'related content. . This 

finding is not suprising if on^ recognizes that most 
*• > - • 

engineer^^ are curious' persons who* ixaye strong and lasting 
interests^ in many areas of scienrie and technology, some 
directly -related to their work and others remotfely 'or 
^unredated (Holland, 1973). 

JAa^y engineers enroll in courses specifically designed 
to teach them knowledge and skills for a particular 
engineering activity, even when that activity is not in the 
domain of their work performance. It is comifton to 'find a 
wide variety of persons from other , engineering and non~ 
engineering vocations in* courses such as "The Hydrology arid 
Sedimentology of Surface Mined ^^ands," a course designed 
specifically for civil and mining engineers. A meCl^anical 
engineer from an equipment manufacturing company may attend 
because of cur iosity and a desire to ujiderstand more clearly 
the problems his clients face. Even though the engineer -is 
not anticipating any Specific objective or outcome which 
will result from his learning, he will fp^gu«ntly find the 
courea produces beneficial resylts in the. future.'' The 
course may help him better understand aspects of the work 
of mining and civil ' engineers. It may cause him to read 
more and. add to and broaden his general knowledge. and 
appreciation of technology arid science. Knowledge like 
money is a very generali.^able currency. Once it has bedn 
acquired there are almost always exciting ,^ wqr^hwhile, and 

"V 



often surprising ways to use it. Knowledqe is even better' 
than money, because it is not consumed in use. Rather it 

* 

is strengthened. Perhaps the experienced engineer whq^ is 
a curious, life lonq learner, characterized by an andragogy 
outlook, is more aware of this than most persons. ' 

-Continuing ed\acation courses in engineering need' to be 
offered in all four categories or typologies. The adult / 

characteristics of the participants need to be recognized, 

*■ * . ' 

The motives for continuing study shojidAd be understood by 
— - 

instructors and participants should be encouraged to enrbll 
in Whatever types of courses' they need for whatever reasons. - 
These different purposes of courses and' the different 
motives of persons within the same dourse in3ure there will 
be many different learning outicomes for any given course as 
well as for different types of courses. This confounds the 
easy assess^nt of learning outcomes. The .intention of the 
course developers, 'the intentions of jbhe participants, and 
the actual operation and teaching - learning/ methodology " oi 
any course all help' determrine what may* or may not bd learned, 
from a course. Evaluations of courses and their , 
ef fectivenessy^nd individual assessments of course 
participants' knowledge and skill must take these fetors 
into consideration. , . * 



^ ; Chapter 4 

" * ^ FORMATIVE EVALUATION AND , COURSE OHVELOPMENT * 

Good continuing* education engineering courses develop 

•SI 

gradually through several stages. Seeking and using 
appropriate information about the need for siich, courses, 
. the It presentation andyfef f ectiVeness can lead to the 
development of a course which teache^ participants what they 
n^ed to know in a consistent and efficient pattern. The 
collection and analysis of i]^f ormation aboat the early 
operation bf courses for the purpose of insuring a more 
effective learning operation in the future is called 
formative evaluation. It is the process whereby initial' 
course designs to meet the needs of participants. are xefined 
and revised. The buj^iness of assessing learning outcomes 
-for improving a particular course requires different types 
Of infomnation at different stages. 

^ Stages of Cour s e Development 

Short courses of a continuing education nature often ^ 
develop, in response to some ^particular need, of practicing 
engineers. This, is especially ^so when the information needed 
is not available from other sources. Perhaps an example 
will help. * 

Surface mining procedures have become very widespread 

tr 

in the last few years with the emphasis ,upofi the use of coal 



as a major energy source ♦ During this same time, federal 



that much of the existing knowledge for construction of 
drainage systems and sedimentation basins for surface mining 
operations is based on -theoretical models and computational 
algorithms developed -for agricultural activities on generally 
flat topography. Consequently / there are* many^ serious 
methodological protJ^ems involved in extrapolation of these 
agricultural methods and models to mining operations in 
areas of great topological relief. The appropriate 
adjustments and modifications of models and computational 
algorithms, initially designed for flat land Agricultural 
drainag,e and sedimentation problems,, did not exist. 
Conseqqently , it was difficult for mining engineers working 
in the coal industry to design proper temporary stream 
channels and storage basins to insure compliance with Federal 
and state water standards ooncerafid-^ith stream lodds and 



agricultural engineers concerned with the proper 
modifications and elaborations of the earlier agricultural 



and state controls on surface drainage '3ystems and water 
quality have become more strict- A problem has arisen in 





#•50- 



and presented as a series of computational algorithms, 
nomographs, charts, and procedural rules by which fruitfully 
and accurately to apply earlier n^del,s, such as the ' . 

Universal Soil Loss Equation, tc problem situations very 
different than those for which the models were d^veloped 
originally. ^ 

This work took many years of effort by a few university 
researchers. As the .research developed, it became apparent 
the new adjustments and modifications would be very helpful 
to practicing mining^^engin^ering operations. Cons.equently , 
graduate courses at a university began to be t.aught in 
this area. The professors involved refined^ their models 
and procedi/res and gradually produced a set of problems 
and motes which became a course on "Hydrology and 
Sedlmentology of Surface Mined Lands." The co^irse and the 
notes evolved ii^to a textbook and an extremely popular 
continuing education short course for engineerjs concerned 
with constructing drainage system and sedimentation basins 
for suface raining operations (Haan SBaffield, 1978). The 
course is in demand because it represents knbwledge and 
skill which i<Q not otherwise easily attainable, but which 
nonetl>eless is. central, to proper complicance with sound 
practice and with state and Federal laws. ^Tfte persons 
seeking this knowledge and sk^ll inj::lude state and Federal 
inspectors as well as many individuals from engineering firms 
who design the tempprary drainage and storage systems. 



This pattern is not unusual in the development of 
continuing education courses. Such courses arise out of 
needs of practicing engineers for specific types of knowledge 
and skill. The courses developed to meet these needs are 
more likely to become effective if care is taken "in assessing 
the effectiveness of the individual. course in meeting these 
-needs as the course is developed and refined . through its 
various stages. 

There are a variety of procedures which provide the 
inform'ation needed to imprc^e developing courses through 
formative evaluation procedures. These include small trial 
offerings or pilot studies. These early experiences can 
provide much information about necessary revisions in course 
content, pace, duration, and presentation. , Early informal. - 
contact wiJth engineers, who are faced with problems ih their 
work settings and who appeal to university or college faculty 
for assistance in specific areas of technical expertise, 
often precedes the development of a particular cour'se or 
courses. I^n this way, needs of practicing engineess are 
often identified and later these 'needs may be met, in part, 
through a- formal shorte course or another continuing education 
activity. 

Selecting Peurse Cont ent 

One of the are^is in which a pilot or trial use of a 
continuing education course can provide information is 

■ 



the appropriateness of the content included in. the course. 
Content selection is always a puoblem. There 'is always more 
content than can be included in ctny course. MoreoverV - • 
continuing education courses are- typically of short duration 
often consisting of intensive two or three' day sessions, ' 
workshops, or sometimes weekly sessions for an- hour or two 
over a period of several weeks, Universit^y professors often 
have a tendency to include large amounts of content, in such 
courses. Participants often seek knowledge of much more 
limited information and procedures. Just how much., and in 
what depth; .'and in what breadth, should be included in the 
content of a short course is often not possible to determine 
until the course has been taught a few times. ' The 
impressions of the participants about h<fw useful the content 
is in their work setting, as well as about the scope anc^ pace 
of the course, are important. This infqrmatioft should be 
routinely, sought in the early stages *of course development. 
Questionnaires and interviews 6f participants before and 
after the course are useful to determine what they think they 
need to know and how much they think they have learned. 
Follow-up interviews and questionnaires to participants and 
their employei«8 after completion of short courses are also 
important. Appendix A contains several actual questionnaires 
used for this purpose by the. Learning Outcomes Measurement 
Project. These instruments may serve as useful examples to 
persons with an interest in formative evaluation of -similar ' 



courses* 



Oftentimes participants 'are introducea to methods and 
procedures irt short courses which •can become very useful td 
these persorfs when they reti^rn to .their work setting, but 
only after much additional practice following the short 
course* If the procedures and methods are seen to be useful - 
by the participants, and if these persons are convinced from 
course acti^iti^s* that they have the basic comf)etence to 
apply properly use these procedures, it is. likely that 
the procedures will be practical and perfected upon returning 
to the work setting. This is particularly so if the short 
course is designed to providei the participants with a packet 
of materials and procedures. to take back to the work setting* 
These materials and procedures can include pomputer programs, 
technical manuals ^ charts , tables, computational algorithms, 
and many more types of procedures or information which make * 
the solution of certain problems easi^ and more technically 
correct* • * ' \ 

^ When the outcomes of a ghort course depend upon such 
cdntinuihgs use of skills and pppcedures, one should not 
lefxpect participants to l^e completely facile, with the skills 
and procedures at the end of the 3 'day, short course. In such 
a situation, assessments of the participants* knowledge of"^ 
how to study further and independently use the procedures in 
the work' setting and his or her willingness to do so jire 
important factors. Subsequent .in£ormation about the degree 



to which participants ac^iially use the procedures and ideas 

taught in the course in their work settings, as well .as 

information about the accuracy of the applications, is also 

very important. What one learns foom evaluations of short 

♦ 

courses by questionnaires and interviews with participants 
immediately following the course, and with participants and 
their employers some weeks after the course, is very 
important information. It can be used to change course 
content, teaching procedures, pace of instruction, and 
^content organization toward developing a more effective 



course. 



Refinin g Course Objectives and Learning Asse ssm ent Tasks 

The objectives for courses also should be subjected to 
f|)rmative evaluation procedures. Sometimes objectives can 
•be expressed in example problems or tasks which define 
clearly what it is ijjj^participant will be able to do when^ 



the course is completed. .This is the cas^ in the Haan and 
Barfield (1978) course on hydrology and sedimentology . The 
exaunple problems and the problems to be worked at the end 
of each unit of the course are very clear. statements of the 
types of skills and knowledge needed to solve this class of 
problems. Furthermore, the complexity of\ the problems ^ 
cumulates until thfe problems in the final unit incorporate 
knowledge and skills from each of the prtofv units. Thi 
approach is very sound for it not only pr6vides th* 

-55- 



76 



LS 



participant with a concise and clear objective (the problem 
itself stated in its urvsolved form) but it is also the meahs 
of assessing the competence of the student. The student's 
P^formance on these\ tasks is useful for diagnosing what has 

0- 

or has nCt been learned, and what^ heeds to be done next in 
instruction. This approach is widely advocated for teaching 
complex technical content (Gagne, 1967; Gagne & Briggs, 1974 ;, 
Webb, 1970) . 

The same problems, which functionally define the 
objectives for the short/ course by presenting the participant 
with probiem tasks that he or she ahould be abi,e to solve 
after completion of the course, also define a series of 
assessment tasks which may be used as pre-tests prior to 
the course to determine where learners are entering in terms 
of prior knowledge and skill. These "same tasks ean also be 
used as post tests after the course to determine ^xit levels 
of participants' knowledge and skill. They may also be used 
as embedded assessment tasks durdn^the actual course of * 
instruction to diagnose any particular learner's need "for 
additional instruction on any particular point:. Tbe same " 
sample problems also may be used in follow up studies Jong 
after course completion, to detarmino the degree to which 
cours* content and. methods are keing applied appropriately 
in the work setting. To the degree that the sample problems 
in the course are representative of the real problems faced 
by th€J practicing engineer, actual samplfc^s of persons' work 



-56- 



77 



'in their job settings may be examinid and scored on tl\e 
incorporation and correct use of course concepts and mebhods. 
This type of assessment of learning outcomes for a continuing 
education cfeurse is the most rigorous possible. It provides' 
information, not only on the degree of student learning, 
but also on the degree of course ef f eetivene'ss for different 
types of participants with yarying levels of prior^experience 
and education. 

^Traditional behavioral objectives are generally no-t 
particularly useful fo^ these important instructional 
organization and learning assessment functiong, unless they 
are stated in terms .of classes of performance outcwnes rather 
than as highly specific entities or "behaviors". The out- 
comes of short courses^ although narrow and focused, ^re not 
usually specifics, but usually an area of general skills or 
a class of performance^K^. For example, typical outc^es 
include the proper design of experiments, the construction of 
runoff control and storage systems, or th'e assembly of 
information and' decision making devices f rora^micro-processing • 
al#ctl9onic components and the interfacing of these with 
industrial machine systems. In each case it is not a 

particular set of specific behaviors which are the intended 

\ 

otxtcomes of instruction, but rather a class of generalizable 
performances. No two experimants are identical, nor are any 
two, drainage and- runoff systems, or any two industrial 
production control systems. The desired outcome in each 



^ -> -57- 



ERJC 



7S 



SIC but qe^eralizable skills. Having 



learned such a set \f generaliz^ble sk/lls, the course, 

b^l 




participants should babble to [better/design any experiment 
in a wide range of situations; W^ign better runoff drainage 
and storage systems across a wide variety of soil, topo- 
graphy, and climatic conditions; and use micro-processing 
equipment to control a wide -range of machinery used in many 
different^ industrial production pperations. » " 

The best route to this outcome is to insure an adequate 
sample of different .types of problems in which to train 
participants to a criterion of mastery in the performance , with 
.special attention to inclusion of a sufficient range of 
problem conditions so as to alert the partipipants to the 
typical adjustments which must be made in theory, assumptions, 
algorithms, or methodology. There is much sijpport for this 
approach to instruction (Bloom, 1976; Carroll, 1963; 
Gagne, 1967; Gagne & Briggs, 1974; Manning , 1970) . 

There are alwa^^ some minirnum number of experiences ^ 
which the participant needs to encounter across some range 
of variable problem situatipi^s if the skills are to be 
learned in this generalizable fashion. 

FC|»iative evaluation of programs and courses can help 
d^termjjL this optimal array of problem situations. It 
dhoiild aiRo be noted that the tasks used for testing know- 

Sr 

. ledge and skill of participants should be similar to the 

-58- 

79 



tasks and problems from the actual domain under study. These 
test tasks >ought to be no different from the. tasks used fqr 
instruction, except that they have been reserved for testing 
and they contain new problem situations not' preriously 
encountered in this specific configuration. It is, important 
that the test tasks are i\ot' the same problems as presented 
in practice and demonstrations in the ^ course of instruction. 
The real problems the engineer fhces in his or her work 
setting will be similar to but not identical to the problems 
selected as instructional dctivities. By having similar, 
but different test problem tasks, the course participant's 
ability to abstract and generalize the general principles 
and procedures presented in cpurse ->learning activities to 
other* problem situations, not before encountered, can be 
assessed. This produces a better estimate than woulc^ 
otherwise be obtained about how well the engineer is able 'to 
transfer material learned iij the course to real world 
problem situations.- 

Test tasks frequently need to' be abbreviated, with - - 
part of the problem being worked out, or the problem being 
d^ftcribed and alternative approaches being pre^aented, the 
course .participant having to choeae the most appropriate 

0 

approach under the conditions stated. Such items «an test 
for high levels of compreheasion and skill but not require 
as much time as wbuld working out an entire real problem, 
"More wall be said about the construction of this type of 

-59- . 

. ' ■ SO " 



test tasks in latter chapters.. (An example of this type of* 
abbreviated test tasks designed for an actual continuing 
education course in hydrofogy and sedimentology of surface 
^mined lands is found in Appendix D) • Of ^course, what -is 
lost with an abbreviated test task is a certain degree of 
validity and thoroughrress in *the assessment. Ther6 is no 
subs^titute for assessment based on actual evaluation of 
real work performance but there are good approximations^ 
which can be done mdre easily and more efficiently, and . 
which will indicate the presence or absence of minimal 
comprehension, skill, and ability to solve typical problems 
encountered in 'the work setting. 

Measuring Only That Which is Appropriate - 

It is important^ to note ^that one does not need' to 
continue to collect all of the information one can about a 
course affeer/t;hat course is reasonably well developed and 
shown by its formative evaluation to be shaped in the 
direction in which it operates most beneficially for 
participants.. For ex^ample, in the early stages of deveiopfng 
a course, questions concerned, with the appropriate 
organization and pacing of content, the utility of the course 
material for practitioners, and the correct emphasis and 
amount of content are all of primary interest. After several 
^r^lications of a course and the collection ^f this - 



infarmation through the routine procedures described earlier, 

-60- 



81 



the course may Be thoroughly developed and function well^ 
If this is th^ case, there' remains only a need to assess th^ 
learning outcomes of individual participants on a regulaf 
. basis for pu^r'poses of- reporting to them and ^heir emp^oyeife 
theprogress of individuals. There is no need to do 'the , 
extensive assessment of how appropriate ^he course objectives 
. and content are, or how effectively the course is. pperated • • 
and presented. Rather, over many replications of the course-, 
these> other questions can^be asked and answered through 

appropriate. Observation ^d data c;ile<^on procedures. an an' 
, occasional basis. , ' v " 

one of the-most effective means to monitor the quality..': 
and effectiveness of a well developed course oyer many " ^ ^ 
replications is the method of "it^ SainpAng" (Lord & " 
Novick, 1968; Shoemaker, 1973). Under this method the many . 
questions and test items wHich^re' useful to evaluating 

course .effectiveness are broken up into small sets. Course ' 
particpants are randomly selected to respond, to one small' • 
set of questions^or test tasks-.. Over replications of a • " 
course, much information may be gathered about all aspects 
of the course -with minimal demands upon the .participants ^ 
time and energy. 

The same. principles apply t^ instructors of short 
courses. Once a given' instructor or instructional te^ has 
sjiccesgfu 11^ demonstrated -that-^he eo ur W is tcnjg ht in an 



-el- 



se 



effective manner, instructor performance need not be 
monitored very carefully during each 'silibsequent; replication. 
Some form of routine and briif participant evaluation of the 
instructor (sV should be continued to insure that participants 
have the opportunity, to communicate suggestions and criticisms 
to instructors and to maintain instructor awareness and 
sensitivity to student - needs . If a wide variety of 
information about the insiiructors^ is des4.red, a very long 
instructor evaluation form can be broken up into 3 to 5 
, shorter evaluation forms. These multiple forms .would eaqh 
contain different items. Course participants would be 
randomly assigned one of the form^V-^Over 'replications of 
,the course a grea^ de'al of information about participants' 
judgipent 'of the instructor (s). aan be gathered, again, with 
^minimal expenditure o^.time and energy. In fact, item 

sampling procedures usually yi#ld more information than 
traditional pr6cedures where all students complete all items. 
This is because the number items included in item sampling 
assessment mea.sures can be milch greater in total than is 
possible in traditional assessm^ent procedures. This means 



that a biK)ader range of the domain of interest may be assessed < 

Of course, item sampling does. not work well unless there 
^are many students involved. Many replications of the scune 
course provide adequate numbers of students. There is a 
•ecorjd limitation. Ite^^ sampling procedures are very useful 
for' ma:kiiag judgments about course effectiveness and the 



skill of instructors. However, since not all students are 
tested upon or asked to complete the same questions, 
inferences about individual student's learning achievement 
and attitudes. toward the course ate not possible. For^his 
reason itelm sampling procedures are^tfaeful for evaluation . 
and monitoring courses and their effectiveness. They are 
not useful for making assessments of individual learners' 
.achievement. . Therefore, it is usually wise to retain -some 
form of common but abbreviated test tasks which are 
administered to all pajj^icipants , usually in the form of a 
test. Later chapters d^al with the procedures appropriate 
to developing and using tests and other assessment procedures' 
for the purpose of measuring and estimating, the degree of 
individual • student achievement.. . 

These same principles apply to the collection of 
informatldn about course participants. Initially it is of " 
great importance to know much about the participants who are 
•likely to be involved in a particular 'short course in. the 
future. The best way to do this is to monitot present 
enrollments on 'the variables of interest. Important variables 
concern the diversity of part\cipanls • expectations, prior ' 
levels of skill and knowledge in Che prerequisite skills and 
content for the course, and participants' occupational history 
and present level and area of prof essicJnal aqtitity'in 
engineering. If the participants in. a given course are 
extremely diverse with respect to necessary iWels of 



-63- 



ERIC 



84 



prerequisite skill and knowledge, the couifso will be very 
difficult or even impossible to tea^ch in an effective manner* 
If a course is geared'' at too low a level or too tiigh, if it 

' is paced too fast or too slow for 'a large number^ of 
participants , it will encounter difficulty. Thus, early 

. attempts to learn much alx)ut the specific characteristics of 
the population <^f enrpllee\s for .a course are important. 

..Once the .characteristics of th^ population which is to 
. be served^are known, adjustments can ba made in the content 
and pace of coursa offerings. Sometimes it is -necessary tp 
meet diverse needs of participants by planning and offering 
more than one course, some -at a very technical and advanced 
level and others at a more basic level v Again, data collected, 
routinely in the ejarly replication of a continuing education ^ 
course, which initially provided information needed in the 
formative evaluation of the course, mag not need to be 
routinelV collected after the course is well established and 

'oha« a stable population of enroll^M with similar 
chiratteristics. The goal should always 1^, to collect pnly ' 
that information which is necessary to^.jiake adjustments which 

»re needed > Testing and other a?se8sraent procedures, such 

. ^ ^ / 

as intenrxews, ques'tionnaires,/ and evaluation of on-the-job 

work performance, are costly and ;tirae consuming. After a 
course has been weslL developed it .is reasonable to ssunple 
replications of courses and participants within courses on 
specific questions to achieve goc((^ infoinitati^ about the 

/ -64- 

^ » 
♦ 



ongoing quality of the course and its outcomes. Xime is * 
always limited and most ot the time' available for short 
courses needs to be devoted to instruction. 

' This is not to say that testing or assessment of 
participants' knowledge, and skill' in the area of course- 
content -should not continue as courses are developed. 
Continual use of. pre-tests, test tasks embedded in the 
learning -activities of the course itself, and post tests 
•can be very important instructional methods, p'roper use 
of these methods can be very helpful to participants and 
instructors .in producing better learning outcomes. However, 
if such testing is to continue, its purposes pught to be 
directly related bo instnuction by letting lejp»ers and" 
instructors \ know the entry level knowledge and skill of " 
particular learners, the types of practice and assistance. 

most -needed by particular learriera to master certain areas' 

n. ^ , . - 

of the course, and to report to the learners and their ' 

employers the amount of- learning which ha« occurred after 

the course on 'certain specif ied/3^tcbmes.* 

» Packaging and Delivery Considerations 

T^iere are many ways '-to packag'e- and deliver short courses 

' . •• ^ . 

Some of these, includfe on site' training by an experienced 

■professional;' the li'se of specially .designed textbooks, work- 
books', programme'd- learning manuals,.- and ..obh-er printed . • 
•materials; the use of demonstrations, films, audio and \ 
_ ^ •• -65- • - • . . 



86 



video presentations,- of !en with ancillary printed materials 
^nd homework exercises;' lahoratprry demonstrations and 
activities (such as the assembly of component^ designed to 
perform in a certain way after first receiving lecture and 
.individual instruction in the basic principles of the. 
components and the uses to, which they may be\ put) ; and 
working of sampie arid demonstration problems in" groups or 
individually with ■ the as^istaftce of instructors arid other, 
experts. , * , ' ' , ' 

The packaging and delivery of a course depends on a 
nuihber of factors including the nature of the .material to 
be taught;, the geographic distribution of the participants; 
, the need for specialized equfpment such as computers or - 
laboratory equipment; the complexity of the material to be 
•learned; the level of prior know^jedge and skill' required of 
the participant; the availability of quality ^jitructorii; 
and. tile expected • duration or "life span" C the course. " 

some things cannot be learned f ronv book§ or manuals. 
Some particular types of courses need 'to'^be taught as 
supervised .expediences much" the way" surgeons are taaght the 
finer points of various procedures, other ififormation can' 
be readily tabght through lectuW'accompanied b) appropriate 
charts, graphs, and illustrations and followed by practice 
Wcises.^ Still other skills for other courses can best 
be taught by individual study .of printed materials and the 
working of rather traditional format problems presented at 
the •nd.of, seqtions of reading. 



-66^ 



87 



If the potential participants for a partj.cular siiort 
course are v^idely dispersed throughout different companies 
over a wide geographic region, it is likely that a short 
course conducted at a national or regional prof essional 
conference or a special regional conferei:ice of a few days 
duration will be most appropriate . In 'such cases, questions 
of the location and adequacy of facilities and the datfes and 
times of the conference activities are important v2rriables 
which need to -be considered. Not all times, locations, and 
facilities may meet the needs q^;^ participants. Conversely, 
.i^ there is a large local pomilation of engineers needing a 
particular are^ of training^ and all of these individuals are 
•located primarily with^ a few nearJ^J companies or consulting 
firms, local confep^ces or even extended programs meeting 
weekly ^f or a/bei^^d of a few weeks ar# viable option's. 

It is often more cost eff ec^ve to package a course so * 
that it- can 1^ mailed out and used locally by any of a . 
number of- organizations' and groups with an^of a large ^ 
nuaber of ^Inttrnctprt, ' onc^ aicourM It o«r«fuily dav^loptd 
'jind foun^ tQ be effective* A common method to package 

.the program of ttudy in the form of mail-out video cassette 
lectures and demonstrations coupled with the' apptopriate 
ancillary printed materials, \40rkbook, and problem materials. 
The Initial cost of preparing the course in this form may 

^ be high. . H^wevir', the advantage, is easy replicability at 
many 'sites. ' Lf a course has to depend upon ^one ^person or. 

■• , 88 * ' ' ' 



ERIC 



a small ^roup of persons for its instru^tVon, it will b6 
very limited in effectiveness by virtue of etie available 
time and energy of the instructor or instructo] 

• One good example of a course which is package^in a 
very cost effective way insuring wJLde replicability is the 
"Design !of Ex^riments" This course was- developed by 
nr. John Van Horn of the Westinghouse Corporation. and is 
; taught by Professor J. Stuart Hunter (Box, Hunter & Hunter/ 
1978J.3 This course is very popular, ttince it deals with " ' 
functional 'skills of experimental design, ,a topic central to 
the work of many practicing engineers. The potential 
audience for the course includes engineers and other 
' technical staff in many industrial firms scattered through- 
out the. country. The packaging, of the program in video*" 
cassette lectures arid demonstf ations -coupled with printed 
ra^rials for^individual participant study is a highly T 
re^licable format. Groups of engineers at anx location, 
may .convene,^ have the video cassette mailed out to a central 
person, engacje in individual study of the materials and 
working of the problems provided, ' and jointly participate 
in watching and discussing the lectures and demonstrations 
on the video taped programs. * 

Before a course is this' fully developed and packaged it 
should have been well evalu-ated and modified ^o be very 

•^The course is owned and«^ distributed, by ^heV Of f ice' of 
Continuing Education and Extension, College of Engineering, 
•University of Kentucky, Lexington ,• KY 40506.. . 

-68- 

■■ •' V -89-. \ 



effective as determined from this early formative, evaluation 
activity. Moreover, this much effort and money should not ^ 
be expended oh a course unless it is a course which is likely 
to continue to be needed by large numbers of participants 

f 

into the future* It is also important to recall a point made 
earlier. , After a course such as the "Design of Experiments" 
is carefully developed, evaluated, and 'found to be effective, 
there is little need to continue the same intense level of 
evaluation activity. Rather, it is appropriate to sample 
sotne particular instances of the course and some pactic ipiants 
to determine if the course continues to meet the needs of 
participants and to obtain information oil possible revisions 
for future versions. Evaluation of individual participants* . 
learning is still appropriate as a continuous aspect of 
the course in order to communicate to the' learner and^ employer 

i - I 

an estimate of the degree of -initial liarhing re^sulting from 
particip'ati-on in the course. . / 

Sele ction of Instructors ^ 

Part of th^ packaging consideration concerns selection 

of an instructor (s) for the teaching of the course. 

'I/ 

Certainly technical expertise and competrence in the content . 
a^ea of the ^course are basic criteria. However, these are 
not sufficient. The most appropriate instructor.^may hot . 
necessarily be the most expert univexsi^y ^jffTS^s^or , but 

• -69- ■ - - 



00 



so„e otheir skilled practitioner, it is not that universitv 
protassprs -Should be ruled put as appropriate instructors. 
A great reservoir of talent resides in such persons. Radhcr 
it is "crucial to select ftom amonc, those professors the 

■ inaJviduals willing to plac. the needs of the participants 
first rather than the *=Jectives of the instructor, it is 
important to select those professors responsive to the needs 

•of participants rather than tho^e bent on achieving their 
own dearly held objectives regardless of participants needs 



or concerns. 



' in practioe it often turns out that^what practicing ^ ' 
professionals need to Jerfor. better is something, other than 
What their professional societies and leading acaden^ics 
think they need." "it is also often the case that the . ' 
practitioner needs a broadened understanding of theory and 
principles in order to^ perform his or her work more wi^ly 
and efficiently, m^uch situations" it is the' job of Vhe * ' 
short course instr<actor to incorporate this theory- and " 
increased understanding by careful selection of examples 
and illustrations which clearly demonstra^^ how « knowledge 
Of the broader theory and relationships makes it easier .0 
solve the problems of practice in a more sound -and integited 
way. sometiines this can be done with anecdotes of stupid', " 
dangerou3^ or uniformed practice on the part of a person y.ho 
sees only a very littlfe of the problem are^ for lack of a 
broaa\ enough grounding in 'relevant theory and concepts. 

31 




'These anecd^JlP^s' can sometimes help participants understand^ 
the importance of theory to practice. 

One example from another technical field is the medical 
laboratory technician who called' the repairman for the flame 
photometer each week because the sodium readings on blood 
serum samples and known calibration samples were highly 
erratic. After a period of six weeks, during which the 
equipment had been condemned as faulty, the technician in'^^ 
preparing -to wash glassware was observed energetically ■ 
shaking large .amounts of soap powder into water in a sink 
near the photometer. Soap dust was everywhere making 
everyone sneeze, settling on the funnel in the flame photo-~ 
meter , and being aspirated- with the next sample, resulting 'in 
a very high sodium reading. The technician apparently thought 
sodium was something peculiar to human blood serum' and 
fluids, and made* no connection 'between tfte' Sofip powder" anS^ 
the high readings. Yet the same technician- was extremely 
careful to rinse all glassware three^ times with distilled 
water to insure no contamlna^on of equipment. There must be ^ 
m^liy good -ex^plea of too naf r^W. an understanding of 
•ngineering principles which le^d to inappropriate practice. 
Instructors ef .gpurseg -for • graduate engineers ought' to be 
able to provide many similar examples and convincing arguments 
that oftentimes the best route to better practice is a more 
solid \inder standing of theory and major concepts. 



-71- 



92 



Specialized Equipment and Facilities IntPlk:atibns^_Fnr: 
Qqurse Del ivery .and L earni ng Assessment ' 

Still another consideration in the packaging of 
continuing education courses is the fteed for spefcialized 
equipment and facilities. For example, a short course 'on 
the latest developments in microprocessing equipment and 
the use of that eqijiipment in the control of r industrial 
machining 'processes may require special equipment ■ and ^ 
facilities. First, Heath kits or, some similar self- 
instructional packet of equ^gment may be useful a^^l^en 
necessary as an. ins true tional- activity within a laboratory ' 
setting.. Second, the availability of a central computer' 
facility with the capability of sipiulation of industrial 
systems control processes may 'be necessary in brder that 
participants may J:est and revise the lo'gic of the programs 
they prepare for their control units. -Third, an actual 
field tfip or demonstration of a number of current 
applications of microprocessor control systems to industrial 
production process may be desirable ♦ 

_ On the other hand, if one is teaching a course on general* 
principlea of engineering economics, • the only faciifties and ^ 
specialized materials needed may be a text book, a set of 
progiaj^raned instruction work sheets and problems, and some, 
common lectures for participants. , The lectures can be- 
presented live, filmed, or on video tape. They can be 
dissemiilated by mail, feiiephone lines, > or communication • ^ 

" • - -72" ' ■ 

'. .-93 .'■ 



satellite, as was the case in \a reeent project in the 
Appalachian region (Mertens, 1979) • 

In short, the need for specialized equipment and 
pe^rsonnel as well as the complexity of the material, dictate 
much about the- packaging of the course, including such 
things as the mode of instruction, laboratory, lecture, - . 
woirkbook,. fieldtrip*," and combinations of these; appropriate 
types .of iri^touttors, university, prof es^^rs, practicing ^ 
engineers, or oth^r specialists; locatiohs^of\he' course near ' 

central ;f acilities. -or d ispe-rsal of thfe cour^ throughout : . 

a vd.de geographic region by use of printed materials, .filas,- 
instructional kits, and, other media; and the number 'of ^ ^ 

♦ 

replications anticipated for the couKse in the futare. 
•■ " >, ' ' ^ - . 

The evaluation of learning outcomes for different typds ' 

of cburse"deliV.ery and packaging are also different in 'a 

number, of ways.' If one is offering a course. on the assembly " 

and use of micropropessors it is, foolish to test only with ' 

a paper and pencil test. Much more appropriate and useful 

in diagnosis of the learning outcomes^ is^eome form of • 

practical laboratory test^- usually shortened and abbreviated" 

in scope ,^ but nevertheless sufficient iiTasses^ng ^bas-ic ' • ^- 

skills and knowledge in a per4Jormar\ce area. Yet in a course. ' 

on "law and engineering", a pencil and paper test consisting 

Qf nuiltiple choice Iteps' which requixe the participant .to 

make judgments About the legality of certain engineering ^ • 

procedure's in industrial manufictuiring in specific eases 



-73- 



■94 



presented in individual items, is a perfectly .valid approach 
- and m^y provide the means to assess a much wider range of 
knowledge and skill* than would a more performance oriented T 
practicil examination. • • 

'*.... - " 

Practitioners' T^cit Evaluation of bourses 

The most, convincing data on the . degree of individual 

* ^ 

participant learning is improved performance in job activities 

It matters very little whether or not the engineer's improved 

workf pepfofmance resulting from the course is actually 

formally measured as long as lemployers and their employees 

who" have completed the training observe that improved* 

performance results. As- mehtioned earlier, it is of ten .such 

informal tacit evaluation of the effects of continuing • 

education courses which is the most functional type of 

eval-uation. If past participants and their employers see 

evidence of the utility of a course in^ improving skill and 

performance of practicing, engineers, the word get3 around 

and the course becomes heavily subscribed in the future. If 

this utility is not perceived, no matter how f'vproper " ' the 

formal evaluation of the course and ita^ effectiveness, the 

coarse wiai not be heavily subscribed. - ' 

' r * ' 

The point is that in the development of such 

courses, the tacit evaluations of participants and employers 
concerning the utility of the course ought to be sovight- and 
^attended to. One can argue that this ^.45i.:aiL^a±terapt to, • 



"please" the client., a tactic, which some academic prof essors , 
specialized engineering sciences not approve,. . However ,\ 
it must be remembered that the companies and individuals 
who enrdll ir\ continuing education' courses in engineering do 
so only by virtue of taking time off from work, at corporate 
and persoii^l cost in terms of dollars, t±m»', and ertergy; and 
in the pursuit of meeting particul'ar types of needs related 
to improved Svork performamce, Ther;efore> what the .individual 
"participant and his employer want,** and especially what '■thev ^ 
think ta be* the value of ^a particular cpntinuing education 
course, are the major criteria for course ef fectiv^eness, * 

Conclusion \ - ^ < • „ ; 

All of the pi:eviou^ material should make ii: clear that 
one does hef^^imply evaluate the learning outcomes for 
participa nta |r r or any continuing education course. One must ^ 
also att^nd.fc and carefully evaluate the characteAstics- 
and effectiveness of the. courses which are developed to 
meet/ the needg of practicing engineers^ The functional 
l^arniftg outcomes of any course depend not onlj^on the/> 
learner, but to a large degree, upon the ^strticture and 



organization of the course. Careful . formative evaluation.. ' 
activities, like those described tft this .chapter , help 
insure th'at a course wi^ll eventuari.y"be well defvelop^ and - 
result in significant ' learning by individual participants.' * 
Without this early formative evaluation "and subsequent course 

-75- * \ , » < ■ 



revision', it is doubtful that even efabotate attention t© th 
•development of "tests" qf participant learning resulting 
from the course will be meaniYigful. When any population .of 
students' seeks out instruction tb^meet their own petceived 
needs in specific areas related to their v^ork and ^t 
personal cost to themselves, the major criterioji by which th 
instructional experiences will be evaluated is ttie ' perdeived 
utility of the course to the work performance." The 
t?erceptions of participants and employers on this matter of 
perceived utility are paramount. Proper attention to . 
formative evaluation procedures in the development and 
packaging of continuing education courses can make.it vety 
likely that particular courses- will, indeed, result in- 
improved practice in some job related area. 

,^ After a;Ll of this is accomplished, • it is appropriate to 
attend 'to the '^^rma^ a\id'routine evaluation of the learning- 
outcome^ of individual participants who have enrolled in and 
completed (particular short courses and other continuing 
edupation experiences. " / ^ ^ 

' The next section this dociimen^t ^eals with the 
development and use of summative evaluation procedures, the 
means by v«hich the progress "of individual learners are 
reported to them and their employers, and also the me^ns by 
which the general effectiveness of particular courses - and . 

» - 

programs are reported to professional societies; governing, 
academic, and standar4s boards; and others 'having a strong 
interest in the quality of individual courses in the 
continuing engineering education arena, 

-76-..- 



( ■ 



. - • Chapter 5 . ■ 
SUf^MATIVE EVALUATION OP LEARNING OUTCOMES 

Earlier portions of this book Have argued that the 
.learning assessment procedures should, be tightly integrated 
into the developjnent and ongoing operation of the course. 
This is because the best tasks "for assessing- learning ' ^ 
outcomes of students, belong to the..same universe of tasks 
from which instructional activities and performance 
objectives are sampled. That is, even if, it is clear in 
.general what is to be taught in some compile performance - 
-area, one still always must sample some.fifiite riumber'of-'/^ 

> 

specific topics to be studied, learning actiWtie* tb be * 
performed, and Sets of instructional^ materials and" 
. methodologies;tob4 used., Wh4ri' one doe« this sampling, one 
trys to select., an- adequate range in to'pics, materials, and 
activities, to "injure that the ^learners can generalize the 
knowledge and skill -they gain from the instructional 
•experience. :Mt is not possijDle to teach all instances and 
^ applications of any particular knowledge and skill which 
may be the intended outcomes, it is only .possible, to 
sample examples wisely, and to provide the student with an 
appropriate breadth' and depth of how the knowledge and skill 
may be usefully applied.^ if this sampling is done weir it 
is likely that,, having completed a coursa of instruction, 
. learners will transfer or generalize the knowledge knd skills 



-77- 



ERIC 



98 



acquired, through exposure to a finite* number of carefully 
I selected learning, experiences, to. many other xsituations » v 
. : related to' their work activity and joh performance. 

Because of the' complexity of developing courses it is 
best to refine early instructional organizations given ^- 
• ^ information about the effects pf the courses on learning • 
outcomes of course' participants . * This fgrmative fevaluation 
■• -.>is ah ongoing, process essential in the early stages of 
• - CQiitse.^cfevelQpment/.. blit it temaiA's important 'throughout the" 
'^'^f^timg/ofi^ particular" course o^feringr- ' ' " • 

' ' Ne ed .f o r\..S umnfa t'i;ve Evaluation \. 
-• , • - . ; — — r-^-*^ — —■ r- , ■ < ^ 1- 

• • '^'here is, however, a second major purpose for eva-luation. 

|. ' There comes a point when a- particular indiviciual who has 

: . . . • • • • ■ ♦ ■ • , • • ' . . 

participated, ih a course wants to know how much and how wefl 
h^ or -she has' learne.d. Emploj^ers who send engineers to 

' " . . • - : " ■ ' ■ . 

. participate in. continuing education courses also want to • 
know, in specific terms, if 'their dnpldyees have learning 
anything and if so, how Tnuc|i • I n addition, course • " ■ ^ 
^ instructors are ^freguently asked to certify -thtt students , 
^ 'have ^oquir^d a. set of- skills, or performance capabilities. 

They too -need information about the degr^e^to which 
. individuals hav4 learned this material and skills taught in 

particular, courses (Enell, 1980, pp. 186-187).. 
. • No matter how well • the formative evaluation activities 
are carried out in the development of ^a course, and no " 
• matter how' generally .effe^ctive a course is shown to be in 



^.^t^rms^of its overall -ef f ectivenoss (suinmative" Evaluation) , 
' fher^^mains a need to know something about the degree 6-; 
each individual ^'s learning following completion of a' 
course. Therefore, it iff necessary- to assess individual 
learners' achievements in any course in order that leari^ers 
^ themselves, or others designated by the- learners ,.- can' be 
^ provided with specific information a'boufhow well * the .6ourse 

' ' ^ ^ . - 

. operated for individual persons, 'in -short, persons whq ' 

■ ^ . 

attend courses, their 'cemployers ,. and -perhaps the *pr6f essional 

; . : ^ » ■ ■ - 

societies to ^hich the- .individuals beloAg, oft^h want - 
infbrAiatibn about the amount of 'learning which has resulted 
-tor particip^ts in a continuihg. education cours-e^ ' 

The common way , to -obtain such information is through .' 
various forms pf testing. Alternative ,.performanc-e 
assessment methods also exist. The iiiformation gathered 
from such procedures • is neVer exact, in terms of reporting . 
the degree of learning Of , participants . As argued earlier 
such measures of learning outcomes are o^ly estimate's. : 
However, if tests and other performance assessments 
procedures dre'properly designed and ^administered , they can 
provide good estimates of .the degree of 'individual learning ' 
on specific areas of knowledge acquisition and skill 
development. Furthermore, information of this type, 
collected and pooled across all individuals enrolled in a 
course, can be used to make statements <3T— bk^ general 
ef fee tiveness- of. the course in achieving , its. intended ' . 



learning outcomes with specific graups 6f jenrollees in 

I 

specific areas ^of skiij or knowledge. When used in this 

banner the evaluation is called sumitiative" because the 

description of the course 'ef f e(^iveness suminarizes dts 

overall impact on learning outcomes 'of participants. 

However, the information of a summative evaluation can also 

•be u^sed in a formative way. Aspects "of the course may be 

mqdified in future replications, given data on present 

effectiveness. Wheh used this way the proceas is called ' 

"formative" evaluation because the future characteristics ' 

»' ' ' 

of the course are shaped on the b^sis of " data about present 

effectiveness. 

Differences B etween Summative and Formative Evaluation - ^ 

: -■ — ^ — 

Summative and formative evaluation procedures differ 
primarily in how and for what purposes the information 
obtained from assessment activities is used. Formative 
evaluation has as its central concern the adaptation and 
modification of early and ongoing instructional design a,nd 
actual pperation of a course toward improving course . 
effectiveness. Summativp .evaluation has as its main' focus 
the reporting of the success of ,indi<ridual course enrollees 
in. achieving specific knowledge and skill at some point in 
time after the completion of _,a course,. , Summative evaluation 
IS alsd concerned with^reporting the typical level at which 

a course achieves its intended learning piitcomes for its 

> *» - . 

enrollees at some given point in time followiilig Instruction. 

> 

^ ■ * - 101 • ' • : 



i 



• * • The next four chapters' or Part II of this book/ detiil* 

different types of testing' p^oe^dures, useful fbr; assessing 
learning outcomes. The testing procedures prese^nted *are 
used for both formative and ^sununative evaluation purpo^ses. 
Th^efore, the procedures are presented in relation to the ' 

assessment of the degree of individual- student learning 

* - / ^ * 

and also as a means to estimate the general effectiveness of 

-a particular' course in achieving its intended outcomes with 

a particular group of enrollees. * ^ 

ihe. four testing procedures presented include pre-tests, 

" tes-ts administered* prior to instruction; embedded jbests, 

* ^ # • «• • 

\t tests included in the ^course of iristriictional activities; ' 

post tests, test administered following instruction; and 

^ delayed post tests and other p.eVforman(?<s. assessments , t 

' • * * * 

administered long after instruction when the individual has 
had time to actually -apply and use in his or her work activity 
the Icnowledge and skill encountered in the qour^e. Other 
learning assessment procedures, -in addition to testing, are 
' ^ also presented.' Different typesof tests and their multiple 
^ purposes and uses are all^o dispussed in this major section 

of thp text. ' , - » 

^ Developing Sound Learn ing Assessment Procedures / 

. *■ ■ ■ ^ ■ 7 • " 

Part III of the t^xt consists of three chapters » - 
>^ 0 concerned 'With alternative methods for developing 'effective' 

-81- 



tes;;ing procedures for measuring learning "outcomes. Chapter, " 
10 e'xpl^ins in detail how'to develop and;use tests an<d-, other ' ' 
•perfdrmance 'assessment procedures in the process o^ formative . ' 
evaluation of courses. The prQ.cedures presented offer a means, 
_ tb..develop gobd courses and good learning assessment' . . * 

• ^.procedures through, an- integrated process of developing 
■inst^ructionaT objectives, pontent, and methods along with 
the learning assessment ^tasks. The main purpose is to design^ . 
valid learning assessment .procedures whic]T will properly ^ . 
. serve formative and sumj.iative. evaluation procedures.* 

Ch^p.ter 11 describes procedures for conduc^i-ng" test 
item analysis and te^t reliability studies to insure high 
quality 'learning asse'ss^ent instruments. Duplicate 
procedures are provided -for two common' approaches to the 
evaluation- of learning outcomes frpm instruction. These are 
the norm referenced and the 'criterion referenced (or mastsry • 
. learning) approaches. The latter approach is the one ' " ' 

recommended for continuing education 'courses for engineers 
for a v^iety of logical reasons which are presented. 

Chater 12 Ltail^ limitations- of tests, no matter how 
well they are designed. -T^ie purpose of this chapter, along 
with other sections in the remaining chapters, is 'to prevent 
^ the abuse or misuse of testing in the assessment of 
individual's learning and in the formative and silmmative 
evaluatipn of learning outcomes for courses and programs. ' 



-8(2- 



1 



ErIc , , ' -103 s 



Chapter 13 is Part JV of the text.. It details how tp 
collect, integrate, and- report data gathered'' from multiple 
measurements of learning outcomes for purposes of formative 
•and summative evaluation of courses as well as for reporting 
the degree of learning of individual students, 

Collectively these ifemaininq^ chapters provide detailed 

procedures and exapiples for developing sound learning 

assessment procedures for whatever purpose, be it formative 

evaluation toward course improvement, summative evaluation 

to describe overall course operation and degree of 

effectiveness, or reporting of individual learner achievement 

to course participants and persons they designate. 

at ' 




Chapter 6 ' - ' ^ 

PRE-TESTS, THEp PURPOSES AND USES- , / 

Pre-tests have three hasxb uses/ First,, they may be"" 
used to inform prospective enrollees of the content of the 
course and the necessary level of prerequisite knowledge and 
skill required for successful completion of coui;se activities/ 
Second, they can provide course developers with information 
about the general entry level knowledge and skills of 
' enrollees, ihformation useful in making subsequent 
adjustments in course content, rate of presen<;ation, and 
emphasis. A third function is to provide baseline information 
'for each learner from which to make inferences about the amount 
of learning of individuals and groups through subsequent 
comparison with .the results of performance measures and .tests 
administered, during or following the course. Each of tihese 
three functions of pre-tests will now be examined. 

Pr'e-te?ts as Informative and Screening Devices 

Pre-tests can be used in "a screening fashion to insure 
that persons who enroll for a course meet the necessary 
prerequisite l«yel of knowledge and sTcill. For particular 
courses, short pre-tests can be mailefl out along with course 
announcements to^allow pifospective encbllees to "make their 
own private assessment of their readirness for a particular 
course and to decide what to, do to prepare ~f or a course to 

-84- 



105 



. insure' readaness for its learning activities. .Other methods 

of infonning. learners of the prerequisite levels of .knowledge 

and skill also 'can be used*. 'These include a simple statement 

•* , .' 

of what types and 'levels 'of knowledge are required, or a 
listing of particular prerequi3ite courses pr experiences 

• • • ♦ » 

which. should have been completed. 'The listing, of 
^prerequisites is a common practice in .engineering cor\'t:'i-nuing 
education. . ' * " . ^ 

Whether t)r not to use a pre-test as a screening device 

depends upon the nature and content of the course. In 

• ' . ■ . ' " *.•-•■ ^ ^ 

Chapter 2 continuing education courses for engineers 

were divided into four categories ^ These included courses 

with^a "focu? on a )^ upgrading dnd remediation .of basic know- 

ledge and skills; h) broadening -a-nd extending pf-eviously 



) 



learned scientific and' technical concepts, and skills; c 
imparting hew and advanced levels and skills in specialized 
technical areas at very high levels Vf expett'ise to keep 
pace with new developments in technology; and d) introduction 
to new areas of knowledge and skill outside of basic 
engineering- practice such as economics, management,- 
human relations, community development, and environmental 
protection. \ ' . ' 

A simple ^brochure describing the purpose c^f the course 
and the nature of the cofitent and learning activities is 
probally sufficient for courses in the "d" category. This 



\ - 



-85- 



ERJC 



106 ^ 



may also- be true .for courses. in category, "b"" as well. 

However, iri category "c" , wherfe there are expectations lor • 

very high levels of technical expertise in given area« 

prior to cour.se entry, .it may be reasonable to design, and 

ma4;^ oat a short pre-teat which actually tests fdh this 

. I ■ » 
prerequisite s-kill and knowledge. A acoring key can be 

•provided the progpective participant. After completion' of 
the pre-fe^fc, he^ or she can decide; if the" course is too 
elementally or too advanced. - Suggestions can be. included 
in the materials sent to. prospective ©rirollee-S concerning 
how tQ meet^^prerequisitiTB. • These' wquld often 'include studying 
certain -^roceduees' published in specific journal aftiiles 
or manuals; working with parti($^lar types of problems or " . 
fequfpraent; aijd "compleMrig*other. specif j/c short cburses'or 
formal Learning experienced to master required basic' /' 
Concepts and skills. • 

, It is .aa^ab reasonable tO' use a pre-test in relation tp 
courses concerned with upgrading and remediation of basic 
skills and knowledge. In thia case*, a pre-tdst can either 
be mAil-ed but to' prospective enrollees or administered 
at. a. common location prior to the course. If security of 
test items, is. an important consideration, the latter 
alternative should be used. He«e the pre-test should consist 
of a ae!t of items similar to those included on the "final 
examination" for the remediation course. In situations 



' -86-. 



1'07 



I where the remediation course is designed to prepare students 
to pass licensing examinations, both 'the. p^e-test and the 
course final examination should be ^ sample of "items 'typical 
of thpse foijnd on the licensing exanjination itself • It'may. • 
also be advisable to include a sample of diagnostic item^ on * 
the pre-test to assess each individual's- knowledge and skills 
in basic math^atics and pby;5ical .sciifenc'e concepts which 
, are require^ to siiccessf ully co'mplete* the ' prpblems presented/ 
m -test Items on the licensing examination. If the pre-test 
itejns are properly sa'mpl^ed from across* Basic diagnostic 



1* 



areas -and .the content of- the remediation course, the^ 



prospective .participant,, as .well as the ^instructor, can 
det^mine. whether he^ o^^she heeds to invest tBia ^ime and 4 
effort* in enrolling in thfe course. - If* tTi^^ person performs . 
veiry^^y- on^ the 'pre-tes\, ejjro],lment in ,th^ coi^se'*would^ 
probably ..bQ non-productiv^. On ;trt^' other hand, if the .' 

individual performs poo;:ly,"^ ot^h^ vould probably be well 

' ' - / ' • ' ' s ' ' ^ [ ' ' , ' 

•'Sdvised td enroll in tl\e course;^ . These procedures can be 

t ^ . ^ ' - , . ' . . ^ ; ' , " 

particularly helpful to ^Jerson? preparing foi'- professional 

licensing examinations. - 



Pre-Tests as Devices for I Adjus1:ing Course^ Content 



and Operatio n ' - ^ - ' ^ ^ / - 

♦ Pre-test information colleGted(roift:inely ^n\Qourse 
participants at th^^ beginning 'of ^the »course can be. very' 

• ' ,-87- ' 



ERIC ^ ; . 108 



useful in providinq information on any adjustment's which 
peed to be made in the focus of instiruction, the rate' or pace 
of presentation;! and the level of complexity and difficulty 
of course c^onteat. MucH has been said about these formative) 
evaluation functions * earlier • Ad4!itional examples of how to 
use pre-tests for this purpose are presented in Chapters 11 
and i3. Uere it will suffice to add that the information 
collected from the administration of short pre-tests at the 
beginning of courses across replications is very useful for 
this purpose* Pre-test and post test scores for a course may 
be measured and plotted similar to '^the example preisented in 
Figure 1 in this chapter or Figure's 2, 3, and 4 in Cf^apter 13 
If similar plots are made over several replications of a 
course, and if the tests are valid and reliable measures of 
specific /learning "outcomes, much information about a course 
and its effectiveness under different conditions and with 
varic^us instructional adjustments in methods, instructors, 
duration; a]ad other factors, can be determined • 

Pre-tests-as .Indicators of Baseline Performance 

The third function of pre-tests, to provide baseline 
•information about the amount of knowledge and skill with 
which participants entered the course, makes it possible 
iq compare tl^e course participants' performance on- post 
tests with the earlier measures* An example may h^Jp 

« 

illustrate this point. Figure 1 is an actual plot, of tHe 

's • • - -88- ' ■.. 

* 



Figure;' I 



X 
E 

S 
'T 

S 
C 
0 

R 

E 



.ERIC 



Pre- and Post Test Scores by Persons With Test Meana, 
Standard Devi'^tions,. and Persons by Score Regression 
Luines for a Short^ Course 0 



.0 . Individual Post t«8t »cor«s (n » 13) 

O IrKiivldual Pr^-test scorps (n = 12) . 

— — -^Test m^ans ^ . . 

, Person by scor.e regression lines 

Test standard deviations 



\ 



I ^- 
I I ■ 

. I'O 

7 
6 

. 5- 
3 

/H 



^ ^ A ■ m=.4l2 



s=1.67e 



O 0 



o 



o 



o 



o o 



x=^.583 
5=1.621 



0 0 



I 



1 



I n 

4 5 
P £ 



6 



8 



10 II 



n — 



N 



♦Persons ordered by rank of post test score 

' 110 



.pfe-test and post teslt scor.es of -13 raedica; laboratory 
technologists enrol led^^a short course. The course was a 
six hour.inte-nsive workshop activity presented in two . ' 
ma'jor se.ct ions- over the course of one day. The course 
content wis 'principle* and -techniques for e^hhancing student 
motivation and ^ achievement in technical courses , through use 
of appropriate instructional designs and tfeaching " 
methodologies. All 13 persons were instructors of technical 
and^indcal courses in medical laboratory technology in 
college and university programs. 

r » • ^ 

The pre-test and post test score of each person is 
plotted against the rank of that same person on the post 
test score. A quick glance a^ the graph sho.ws each 
participant's entyy level knowledge of course 'content as 
measured by the pre-test. Also shown is each individual's 
exit level knowledge as measured by the post test. 'The 
horizontal lines represent the means of the' pre- and po! 
test scores^fpr the group. These mean scores clearly, 
illustrate that the performance of participants in the course 
improv^ following instruction. ' In addition, a regjression 
line may be fitted to ejich set of scores. The slopes of the 
,two lines" reveal information- about the differential 
effectiveness of the course fat persons of dif f ering^ability 
levels as defined \>y their rank on the post test- or by some 
other functional perfojnnance criterion. Statistical 
significance tests may be performed on the differences between 
, • - -90- . ■ • 

111 



pre- and. post test means for repeated measure situations. 
in adclffion, statistical ; significance tests may be performed 
^ on th^difference in sl^e of the regression ' lines for the 
pre- ^-xkxd post test scores.. These statistical inference 
procedures can lead to-power'ful generalizations about the 
general effectiveness of kny given course in improving the 
performance of participants., Replication of results such 
as those shown in Figure 1 across many trials of a given 
course are even mor.e convincing of the effectiveness of a 
course than are the^- many statistical inference procedures 
which are possible.^ Chapter 13, pages 270 through 280, 
contain the pre- and post test results^ for groups of 
engineers enrolled in a ^ short course on urban water quality 
modeling. Figures 2, 3, and 4 provide data for three 

9 ' ^ ^ 

replications Of the course. Inspection of th'e graphs show» 

* 

thai: the course is effective and consistent. . ' 

If no pre-test had been given to the participants shown 
in Figure 1 or to those shown in Figure| 2, 3, and 4, it 
would have been impossible, on the basis of the post test 
alone-, to determine *he amount of learning .taking place or 
even if learning had-: occurred. . The score on the post test ^ 
would be uninterpretable because' one would be unaware uf the 
■entry level ability of participants. In the absence of the 
pre-test, the inference could be made that th'e post test was 
too easy and did not-measure the learning which had occurred. 
Another inference colild be that the post test was> of . 



112 



appi?o.)ria"te 'difficulty anJ that persons had learn'ed froji 
the course. Still a third inference could he that the post 
. test was of appropriate difficulty, but that the 
participants. were already knowledgable 'of the content oi 
the course and might have scored this high or higher on a 
pre-test. This last, inference would cast doubt on the . 
appropriateness, of the course for the participants. The " 
point to be made is simpl^^ this: without the pre-test data 
it is dif f icuItto^^GjToose^r,^^ the alternativ:e inter- 

pretations of^the oost test results. Of course, a fourth 
inference could be that the post t^est score is unrelated to 
what the 'course p^^rt icipants learned. The validity -of the. 
test is questioned in this case. It noes without sayin'j tfiat 
any inference about how much persons have learned based on 
test scores neieds to be derived from valid and reliable tests' 
Chapter 10 details methods by whigh t6 insure that tests are 
developed which are reasonably valid and reliable. Chapter 
11 points out the limitations of even the" best tests and 
provides suggestions for broadening one's inferences about 
learning outcomes by using other appropriate and multiple . 
indicators. ^ 

In reality, there are always other indicators of the 
degree of participants' entry knowledge and the amount of 
learning, experienced during a course other than pre- and 
post test- score comparisons. These -include : the amount of 
learning""course participants report, the logical 

• . •■ -Si- 
lls ■ 



determination of the appropriateness of the post test or 
other learning asselsment used by the feoutse instructor, the 
rate and accuracy with which participants are able to perform 
the instructional activities and exercises during thej 
course of instruction, and other similar indicators such al 
the ability and willingn.ess of the course participants to use 
properly the knowledge and skills acquired in their daily 
work activities. Both earlier , and later chapters of this 
book make it clear that these other indicator^ of learning 
should be systematically collected and used in making ' 
judgments aboiit the effectiveness of a patticul-ar course as 
well as about the degree of learning achieved by individuals-. , 

However, this does not negate the need for more formal 

V 

assessments of entry and exit level knowledge and skill. . 
Information of the type presented ^iii Figure 1' is vdry 
helpful, not only to the perl^ns- who 



ard concerned about its general effectiveness, but to the 



operate the- course and 



his or her own amount 



individtial learner who' cap determine 
of learning as measured by th4 pre- and post tests, in 
relation to, other course participants' performances and in 
relation, to some level of mastery b'f the course content and ' 
skill as defined hy the cburse developers or some other 
criterion such as common standards of practice (Lacefield,, 1980) 

Learning Resulting from Prji-test Experience 

.Generally if pre-tests are to be used, they ought to 
consist' of individual items ore assessment tasks parallel to 

-93- 

. Lit 



but different 'from the items or^tasks used in the nost ^:est. 
One does not care if students learn from experience witli a 
pre-test. In fact,, this is often a beneficial outcome. If 
the pre~test helps participant's better understand what it is 
they need 'to know or what it is they need to learn to do, it 
can be very fatilitative to the instructional activity' v;hich 

i 

follov;s. Pre-tests can often define for the 'learner, in very 
operational terms, what neecjg to be attended to most in the . 
instructional activities which follow. However, different 
items and assessment tasks are required fqr the post'te^.t 
because one does not want to be tneasuring, only or largely, 
increased knowledge and facility with specific test items 
or specific assessment ta.sks, 

, ^ ' \ \ ^ 

If one administers a pre-test, corrects the test, and 

% . ' ' " ,^ 

reports the results to the learner, post test scores on the 

same test items given at some later time will be higher. 

Th,is is generally true even if no instruction intervenes- 

What has happened in such cases is that individuals have 

learned specific response^ to specific test items or ' 

performance tasks. They may have learned some other more 

geheralizable things as well, such as how certain 'terminology 

is used, the style of the test developer, or obtained a better 

idea of what it is that the course instructor deems important 

enough to test f or . ^ Therefore, if one gives the identical 
test, as a pre-test and post test, any gaiijs in learning which 
appear are confounded. The confounding is between iricrftased 



performance due to familarity with specific aspects of te«t 
tasks and items and increased general knowleige a.nd skill in 
'usirtg and applying the concepts an^ principles learned in the 
course. : Beeause%f the confounding one does not know if the • 
post test 5core represents only specific learning of how to 
^^t Ahese_itemsj=or£e^ or generalization of basic knowledge 
and skill concerning waxs_to solve these types of or^hlgJL''- 
While the latter ouVcome is a proper goal of instruction, 
bhe former is not. Consequently, care mu'st be exercised to 
make pre- and post tests parallel to one apother, but to 
consist of different item§. Otherwise the observed 
differences in pre- and post test scores wiU not be easily . 
interpretable. ' • ' . 

There are a number of eva.luation designs which can be 
us^d to measure learning .outcomes without recourse to pre-- 
tests. Some designs, even allow for the effect of learning 
from thd pre-test to be, separated from learning effects 
resulting from instruction as both are reflected in post 
test scores fMason & Bramble, 1978). These evaluation design^ 
v/ill not be described here. .However, they are especially 

useful when a cpurse is to be replicated several times and 

t 

differeiit evaluation designs can be used with'^each replication 
to estimate various contributions to .effects measured bv 
post tests of other per formancei? outcome measures. It 
suffices to conclude that,-_whirfe it is not always necessary 
to use pre-tests, if properly constructed; pre-tests 'serve a 
number of important functions .which make them' worthwhile. 

lie ■ 



\ 



ERIC 



. ^Ther.e are two general approaches to the construction of 
different but parallel forms of pr^- and.^st tests. The 
first method is .to define, a 'semple^of performances which thi. 
individual should be able to carry" out after instruction.. 
From this sample of performances, specific assessment tasks 
and test items can be prepared, without regard W-to ' 
assignment to pre^test or post test use. In fact, as 
mentioned earlier, the listing of this sample of performances 
can also be useful for defining an appropriate range of 
. learning activities and experiences by which to instruct ^he 
learner. In this sense,. both test items, or qther forms of 
assessment tasks, as well as the^'^activities and , exercises 
selected<(for inclusion in course content by' which -to. instruct 
learners, are all samples of the class, of performance 
capabilities. Learners should be abl^ to apply th6 specific 
concepts, skills, and^knowledge they have learned in a course, 
to. the successful completion o't real life probl^s in the ' 
content area under' study or -to test> items whic]i.^ulate' " 
these real life- problems. ' . *• ' 

.Under such a plan some of these performance tasks are " 
assigned to a pre-test role, othefs as po^t tBst^ and still 
others as instructional task or practice activities. In • ... 
.such a case, the course designer will serecf several sets of 
t^asks much alike in- terms of the levels of difficulty,, the " ' 
.types.of concepts and^skiils whiah'ar6 being applied, and 



•r.96- 



11 7 



the types of prpble&"^#ibaati9rk;<^they^rapresent • The pre- 

• ' " ^" 4 • " 

test, the sample of' examples aht3 
practice activities seleat^dVa"^ ciquirse content; and the post 
test ought to .all ref lect-^^'^iJjn^fi ^ea^dth of difficul1;:Y 
levels and areas of applica5£x)V^"X^^^^^ pre-V 
and post test comparisons /D^^\p^£k^ipan* EJerforma^ices, as 
well as the accuracy an4 easg^^JA^th^'whicTi instructional, 
practice^ activities are compl^Vted during' the 'bourse of 
instf-uction,^ are very useful ^ya making inferences about the 
amount of learning resulting firorf instruction. * If such a 
plan is followed, it is iniport'ant to^ insure that bqth the 
pre- and post tests contain a proper distribution of test 
items or assessment tasks. ^ The problems fJres^ted in test 
items must require application^ of course concepts and 
principles across, a range of typical conditions representative 
of thpse encountered in actual work areas. • 

' ^ In this first approach, there is no attempt to prepare 
duplicate pairs of items to sdrve* pre- and post test 
functions. Rather the entir'e domain of perf ormajrice being 
taught in the ^course in concepgtualifeed as a universe of 
multiple performance capabilities. ;.Vrhe course designer 
samples from the domain inany^pecif ic tasks which collectively 
represent whatsit is a person must -be able to do ^o- exhibit 
mastery of the performance domain.. MS/ny individual items are 
developed, with attention" to achieving a proper distribution 

Of Items across various levels of difficulty or complexity 

' 118 •: • . 



Assignment of items to the pre- or post, test r<5le is randon 
•with no particular attention paid to matching .each item, on 
the pre-test with an equivalent item on the poist test. 
Taken as a whole the two test forms are considered equivalent 
because their items are drawri from the same universe, not 
. because they consist of individual paited items, items on 
both forms of the test are sampled in a uniform manner across 

_ • ~ 

the domain. EitheJr form of the test may be used inter- 
's 

changably in any test role, pre-test, embedded tedt, post 

test, or delayed post test. This method works best when the 

- ■ / 

domain being tested is very broad and the test^ developed 
are quite long. This approach to parallel form test 
construction is very usefjjl in courses in the I'irst category 
.c<J!icerne4 with .remediation of basic knowledge and skill>. 
. Here the performance domain is very large and «iany parallel 
form tests can be sampled from that domain with-, little, or ..no 
duplicationr of.,J^ividual itjems..- . • 

A' second general a'pproach to the construction of pre- 
and post tjssts is to first define the content and skills 
expected as outcomes frcnn instruction and to then prepare 
duplicate items for each area of performance to Ve tested*' 
The items are duplicate in terms of • difficulty level, 
knowledge required, and the specific skill or concept to be 
applied. They are different only in the specifics of a given 
probl^n^^ problem application situation: . What results from 
such an approach is a parallel set of items; one set may be 
used for the pj:e-;test and the other for the post test. ^ If 

-98- 



the pairs^of test items are developed, the members of 'each 
pair may be randomly assic>n-ed to Tsifher the pro- or. theVpo.t 
test. In. addition, the entire item assembly in the form of 
the pre- or the post -test, may be tosed for either function. 
That is, the two^^arallel forms may be used interchangably 

.as pi?e-.or post tests'.N.They can also be. used in other test ' 
• roles' siich.as embedded ''te^ts or delayed tjost tests. The 
basic plan under the second approach is. to develop one good 
test which is comprehensive interms of its representation of 
course knowledge' and skills and in terms of a mixture of 
relatively easy though difficult tasks. When one selects a 
particular area of knowledge or skill outcome to be tested, 
. one' simply develops two items each time, rather than on^l^^ 
one. One item is then assigned to the pre-te^t item pool aid 
the other to the post test item pool. This second method i3 
most useful when the domain is well defined, somewhat narrow, 
and th^ tests to be developed will be short, perhaps under . 
30 items or thereabouts. "The second method is probably 
most suitable to the typical short course with its few well ' 
defined intended learning outcomes. ^, ' ' 

Either of th^ approaches described above will, produce 
pre-tests which make initial assessments of participants' 
entry learning levels and post tests by which to ma.ke- • 
inferences about the amount of learning in specific areas 
following instruction. Such information is of value to the 
persons who operate t^e course, the course 'participants ,. and 
the persons who employ and professionally certify engineers. 

-99- 



Chapter 10 contains cletailed procedures for the 
construction' of pre-, post and' other types of tests. 
^Detailed procedures for the proper sampling of t(3St items 
within the performance domain of interest* are prc^vided. 
Both methods of p;reparing multiple'^fo^ms of testis are 
presented. Procedures for insuring the 'validity of the 
tests are described. ..Chapter 13 presents detailed 
information about how t6.-use tesi^ data to make inferences 
about the degree of learning which has been achieved by 
individuals and to make judgments about the overall 
effectiveness of croAirses. ^ 





Chapter 7 ' - - 

EMBi:D4i)i:D TI'STS -- TIIHTR PURPOSES AND USES ^ - 

embedded tests are simorv assessment tasks built* into 

^ the Instructional sequence and interspersed with^ other 

; instructional activities. Common forms of embedd^ed tests . 

include horfework problems frequently.^ assigned aftler each 

class session in typical ^kirses in engineering and/related 

scientific? an^ technical fields. Frequently these completed 

- • . r * r ^ ^ 

. h6meuoi:k problems are/col l-e^c ted,- corrected , the stude^'^s 
perfqrmanCe re^p^c^^d^^nd then returned to stii^enbs. - 
3tudents may ^cbeat" on s uch^Qmework assignments > but are ^ 
Abolish tj6* ^ sp ' bei^use^'^ing ^he problems is a practice ' 
activity,, of^n /i^e^'maaid^- iea^nih by'^hich 
students cbecomet^ill^d' fn the applicatibji of course concept 



ail$^ principles'.^^ • \ ^ 



Tra d^itional Embed ded Test T^'sk^ 

Although* homeworlj is*given for practice^ the results of 
homework per^formance by students' can be» examined to reveal 
the degree of student understanding of courfee principles and 
concepts^s weld ^ the presence or absence of more basic 
prerequisite skills. Godd instructors interact ,with studenitf 
in precisely this manner. They frequently ^ assign homework 
in graded, levels from simfSle'to more complex problems. They 
collect the homework, requiring that it -b/^ -completed in a 
reasonable length of time; They^then cprrect the homework, 

' -101^ ' 



not only indicating if the pcoblonfe are right or wrong, but 
pointing out errors to students. Frequently, the 

V 

instructor identifies common problems which reoccur in a 
given student's work. The . instructor often calls the 
attention of the student "to the problem area or areas 'in 
written comments o£. by/ a personal conference. Sometimes it' ' 
becomes apparent to the instructor, thart~^th^~majority~of 
students have misunderstood or not fully understood some 
^particular procedure or concept: Then the instructor- often 
takes steps to correct or remediate the~ area of misunder- 
Standing in subsequent lectures, laboratories, or similar 
class learning activities. Homework, therefore, is a farm- 
.^-f assessing learning ou±coirfes particularly useful in 
Eormativ^ evaluation activities. 

^-^^ Although good instructors have been using these 
.pcocedures for many years, it' is important to realize that 
the data collected from thp performance of students pn such 
"embedded test" tasks can >be used to make strong inferences 
^about the degree of learning achieved by individual students 
and th^ general effectiveness of particular .courses in 
achieving their intended outcomes. Other traditional 
embedded assessment tasks include freguept quizzes, practical 
pt laboratory -examinations and tasks which require t^he • 
demonstration of a particular skill at various lev'els of safe 
and competent practice. The design and completion of various 
experiraeifts, .spe.cial projects, reports, and analyses -are 



other examples of common enbeaded tasks. Each of these tasks 
calls for a skilled performance of some type. Often parts 
of the performance can be observed directly, particularly in 
the most critical stages. In addition, the performance 
almost always results in a product of some type, whether it 
be a set of design specifications for a bridge or a dam, 'a 
design for a sc lent iTic ^experiment, or an analytical report 
of the structural properties of a particular type ^©f 
material which i"s to be us^d in the fabrication of machine' 
parts. 

The actual observation of the student's performance is 
a very informative assessment of the person's level of 
c^mpetenpe in using the skills and knowledge which have been 
taught. # Examples of areas where such observational 
assessments are appropriate include: performances usin^ 
special equipment such as scintillation counters, electron 
microscopes, or x-ray crystallograpfty equipment. Other 
examples involve the selection and use of computational " 
algorithms in solving particular problems; the making of 
certain assumptions about the way a problem task is 'fruit- 
fully approached, inclu<i\ing identifying those solutions 
which cannot he used; and the development and use of computer 

pro.grams^*£or data analysis and/or simulations.^ ^ 

• » 

The direct observational assessment of the student's 

} 

perforlnaiice (and the products resulting from the performance) 
to determine the degree of learning, and specific strengths 

' . . ^ -103- 

• 121 , 



and weaknesses in an individual's competence, is a powerful 
assessment procedure. This differential assessment of 
students' strengths and weaknesses in complex areas of 
performance based upon direct observation by experts who are 
also tutors is a form of embedded assessment task which ought 
to be used more frequently. These types of learning 
assessment procedures are very useful to instructors in 
diagnosing what it. is in particular students are and are not 
.yet able to do. In addition the assessment procedures are 
also very 'instructive to the students. Embedded test tasks * 
comprise an integral par^ of the instructional activities by 
which «^mplex performances are learned. Persons seldom 
master complex performances -in one trial. Rather, many 
trials are required and performance may be expected to be 
very variable and incomplete across parts of the task, 
especially in early stages of learning (Bugelpkik, 1971; * 
Gagne, 1977). Embedded tasks of the type which have been ^ 
previously described are very .useful in testing the 
individual's performance capability across the various parts 
or component skills and knowledge required for skillful and* 
accurate whole task completion, They are extremely useful 
for making decisions concerning when instruction can be 
stopped in certaift area's of performance because the person * 
has learned the skill or concept to a mastery l)Bvel and 
further instruction' would be pointless. Embedded t'asks are. 

t 

-104- 



also useful for 



determination of \7h0re additional instructiioi 



and practice are necessary. 

Embedded test tstsks may take many forms. ^ As .we ha^ve 
seen, they are of ten -practice problems assigned as homework. 
They can be quizzes, ^projects, exioeriments, reports, and 
similar activities. .However, embedded tests can also be ^ 
more abbreviated samples of work performance as is the case 
with pre-test^, post 'tests, and delayed gost tests. Although 

oC' ' 

actual samples of performance under conditions similar to 

those in the work setting are ttie best ways to assess th^ 

degree of learning in a given area, other considerations 

of ten. prevent the inclusion of very*many such assessment \ 

tasks. Sometimes thete is simply not enough time to provide 

the necessary number of indepth instructional and laboratory 

learning acti^vities and afso provide equally complex 

'and involved assessmeTit* tasks by which to make inferences 

about the degree of student lear^ning.' One solubion to this 

problem is to keep better track of each learner's success and 

iacl^-o.f success with each practice and laboratory task which 

is assigned as part of the instructional sequence. Another 

solution, is to use an abbreviated sequence of tasks as test 

items by which to make reasonable infe^^ces about, the degree . 

of learning achieved by student^ on the complete performance 

task. The first -solution has already been discussed in 

terms of keeping good records of student performance in the' 

completion of instructional activities subh as the doing of, 

-105- , 



126 



homework problems or the carrying out of laboratory 
activitj,es. I,et us now consider the second alternative. 

. Abbreviate d Forms _olJjmbedded Test ■gfsks 

Even on regular tests in areas, of complex performance, 
the test tasks must be abbreviated. Often students are not 
required actually to complete the solution of a complex 
problem, hut only to set up the problem in a , correct manner. 
This, tests for the student's understanding of how to 
formulate the problem, what class of solutions to consider 
and select, as well as what mathematical models and 
procedures to use to achieve a solution. It does not test 
the student's ability to accurately completl" the solution. 
Oftentimes, when if is of interest to test the students 
facility with computational procedures as well as problem 
formulation, the problems ' presented as test items are 
modified to be more simple than tiiose usually enc^)untered in 
practice. \)talues given are frequently, in small oir whole 
numbers. Actual computation "is deliberately simplified to 
insure that the student can complete the problem in a short 
period of time, usually minutes, without the aid of a 
computer, extensive references, or table^. 

Still another common approach is to ask students to 
£t<LOailize^ rather than to' actually carry out, the incorrect- 
(or correct) application of principles learned in the course 
of instruction. In such a situation, the test task might bo 

. -106- 



the description o'f an experimental design which an engineer • 
has developed to test the effectiveness of a mechanical 
component, say an automobile door hinge, compared to other 
hingte designs, with this typ^ of test item several true and 
false, multiple choice, or short answer essay questions may 
be asked of the student about the problem situation whdah is 
presented. These types\of items can test for the learner's 
knpwledge and ability to recognize correct or incorrect 
application of experimental design principles, to judge the 
magnitude of experimental results in terms of 'being' reasonable 
or unreasonable, and to be alert to methodological errors. 
Such assessment tasks can be very demanding, can be administered 
in a much shorter time than would be needed to actually design 
an experiment or carry out some other complex performance, 
and can also be very informative about the degree of student 
learning. Perhaps an example will help. ' • . 



i 



A n Examp le of a Course with. Embedded Test Tasks 

In an earlier chapter a course titled, "Hydrology and 
Sediment'ology.of Surface Mined Lands",' was discussed briefly 
(Haan & Barfield, 1978). It was . pointed out that the example 
problems which are found at. the end of each of the course's ^ 

♦ > 

six units define the functional competence and knowledge 
participants need to develop. to demonstrate the mastery of 
the course content. Over the three day period that the 
course is taught, the instructors present brief lectures-. 

-107- . - ' * 

\ 128 r 



Each lecture is focused around some common problem such as 
the task of designing stream channels for diversion oi water 
from surf ace .mining operations. -.The lectures" and the • • 

4 

printed materials, jjlustrations , tables, graphs, and related 
materials which are provided to participants in a single . ' 
textbook are all closely interrelated. The purpose ia to 
clearly illustrate:, a) what the common ^toblem is and jthe 
range of variation which may be expected in paramenters 
affecting the solution of the problem; b) the appropriate 
theoretical models, concepts, Computational algorithms, and ' 
procedures basic to solving the class of problems; and 
c) the necessary adjustments, corrections, and modifications • 
of models and procedures derived from theoretical and 
laboratory research to make them compatibly' with actual field 
conditions where more variables are operating in ways which 
cannot be controlled to the same^ degree as ih 'a laboratory 
experiment. The printed instructional materials', as well -as 
the lecture and problem solving-activities of a practice . ^ 
natur-e which- follow short lectures, are ill designed to help 
engineers bring' to bear the most appropriate knowledge, ' - 
theory, and procedures for the design of hydraulic channels 
and storage- structures. Stiidtents Idarn basic hydrologic 
theory and principles as well as many, rules ofi thumb and ■ 
specific procedures for making .estimates about dfesign 
specifications for sruch structures within certain probabilistic 
limits of maximum .24 hour storm rainfall and runoff, desirable 



/ 



safety factors ijn the .performance of the structures, across, 
various types of soil materials., and under differing slapo 
conditions ♦ ' ^ 

^Much of ,«=ie learnina necessary to apply competently 
these engineering pnn^riples involves. the use of complex • 
^ nomographs and the use of statistical tables which. list ' . 
rainfall patterns and probabilities for various geo^iraphic 
regions. Soil typ^s must be classified into various 
categories by which- numerical values can be assigned to 
'maximum permissible flow velocities to avoid channel erosion 
and sediment deposition downstream. Boundary properties^ of 
stream channels must be placed into various categories * ' 
depending 'upon the type of vegetation in the' channel or other 
cover and the resulting iretardance to flow values. Many 
other variables -including watershed ..ground cover, season of 
the year, and intended duration of the structures, must also 
be considered; For each variable there'are one oj: more- 
procedures by which to estimate a numerical value. ' There ■* 
ar.e oth^r procedures for combiriin'g these values and equations 
to calculate design features of the stream channels and the ' 
storage structure's. ' There are still other procedures by 

whi(A to ch«ck independently the reasonableness of the design 

\ " " ■ • 

specifications calculated. It is important' to' double check 

and ir^sure that the final design specifications are safe, 

cost efficient, and effective from the standpoint of their 

desired performance. ' • • 

-109- 



130 



. • -"'^1^?^^ specific 

^urce.4n Wiici^^^i^^^^^^^ ''r 

textbook for ^o^^"Uj:e.e^ "jt 
. sPe^i^i^^Uy designed ^^^^ , 

work setti^ig -in prd¥r-?fiit he 6r^she,,may ci^htinue 't; .use itS. 

tables, -procedui.es;,^^^<^^^^^^^^^^^ examples .in the better"^" 
^solution of aa^Gai-^ -design problems . = U • 

. ■ Because of ^he complexity of the .material and .its proper 
applicatioh to real problems, the developers of the course 
embedded real^problems as demonstrations in each section of ^ 
the course. The first set of -problems in -each unit of 
instn^ction serves to Illustrate how the course concepts and, 
principles are applied. These demonstration Voblems are 
the main instructional tasks upon which instructors focus ' 
and in which parti/: i^.ants engage to learn course concepts, •• 
skills, and procedures to solve particular types or classes . 
Qf problems . «^ • 

«^ immediately,. after these- embedded tasks or problems, there 
4s a detailed, point by point, explanation of - how the 
problem^ can be solv^, often more than one method of solution 
.is given. The textual materials provide-a step.by ^tep 
illlustration of how the course concepts" and procedures are 

""^^ problems sampled from the array of those ' 
encountered in the real world practice 'of surface mining!, 
activi^es.^ This allows an individual learner to stu^y the 



-110- 



131 



bacRground theoretical and empirical material presented in ' 
each -chapter and then to worh alona with the deinonstration 
. lirolSlems in the actual appfi^cation of these principles to 
real problems. . - 

After tke demonstration, problems, a secpnd set of - * 
parallel but different problems is presented.. These are th.i 
embedded tests. They require the learner to demonstrate,- an'.l 
transfer the application of pourse principles and concepts ' 
learned in the practice. problems to another set of similar 
problems drawn .from the same^ conceptual universe of real 
practice situations. During- the course^ participants are 
asked to solve these problems and allowed periods of from 1 
to 2 hours to do so. > Sometimes the problems are to be only 
partially completed or set up. Following this the^ 
instructors hand out completed solutions to the problems in 
•order, tWt ' these may be st'udied' and used immediately (and 
in the future as well) by the engineer to check on the 
accuracy of his or her application of course concepts and 
procedures.. . ' . - - ' 

One oth.er property bf bcfth'the embedded -pjtract ice problems 
and the embedded test probleins neeHs mentrfcewH.r\g - . Both * 
cumulate in^ scope and difficulty over the sequence of six 
chapt„ers in the text. That is, the problems in the first 
parts of the course are small' an.d consist of orily part of 
the^ total task ofl designing, tjie necessary hy'dirologic , - 
structures needed to divert and. stpre runoff water for surface 



mininq operations.^ 'liateir chapters in the text require the 
use and intearation'Wof all of, the preceeding maeterial toward 
the design of complete drainage and storage systems. This ~ 
is a strong feature of the course.. It Is difficult to , 
integrate properly a„large body of concepts and skills 
required for a competent performance in a complex area. 
There is no reason. to believe that persons can easily learn 
to put together all of the parts of such complex processes 
unless they have been gi^jen practice and specifically 
instructed ih how to do so (Bugelski, 1971; Gagne & Paradise, 
1961; Gagne, 19G7, 1977; Snelbecker, 1974). 

j^^-°'L^-yL?. Jl^lPect a t i_o f or Achievemen t Withi n a Short Co u rse 

What is it reasonable to expect in the way of learning 
outcomes from a three day participation in this continuing 
education "Hydrology and Sedimentology' of Surface Mined 
Lands" short course?* Is it possible that enrollees will < 
become facile with ^Pt of. the complex theories, concepts, ^nd^ 
procedures encountered in the course at the end of the three 
days? In fact, it is not: ' Abobt all that can be hoped for 
is' that the participants will: ^ 

a) become highly motivated to continue^ study and ''use ot 
the course textbook or manual biecause they see how useful it 
can be to them^ in improving their performance in their daily, ' 
work tasks. 



/ 

b) become familiar with the main ideas, concepts, 
procedures, algorithms , and the adjustments which must be ' 
made in these to accomodate Siffering soil typ.-s, rainfall 
patterns, surface cover, and so forth. 

c) know how to use the many procedures, nomographs, 
tables, charts, and models presented in t,he manua.1 in 
intelligent ways to produce reasonable solutions. Much of 
this is knowing when, why, and how to use a' particular 
approach to solve a particular design problem, knowing what 
the reasonable values should be for certain types of 
problems, and knowing how to check on the accurac^y of one's 
calculations by independent means. 

The content and skills of the course have considerable 
utility for the participants, particularly if they are 
involved in surface mining engineering. Some weeks or months 
. after the course is completed participants may become' more 
skilled in the use of the procedufes' through application of 
course "content on the job. Many highly technical continuina . 
education courses share this characteristic. 

♦ 

g£!Leticca _Prj3blem Embedded Test Tasks 

in Short Courses \- 



t 



ERIC 



Sup^OsTliow, that one- wishes |to evaluate the learning 
outcome^ for the course at the end of the three days. How 
might this be accomplished? One' way might be to require eacrh 
participant to complete individually and ful'ly each of the- 

-113- 



134 



J 

X 



mariY embedaea tests or problems at the end of each chapter 
or unit of study. These could be collected; corrected, and 
an individual conference could b» scheduled with each studer;t 
to report their progress and to instruct the student* in any 
areas needing more • attention. The final, and very complex, 
cumulative problem tasks prdvided in the Hahn and Barfielci 
,(1^78) te^t are excellent assessment task-s. Their completion 
requires the proper uae integration of all the prior ^ 

• learned information and skilTs. As soui^d'as this procedure 
is with respect toS instriictiori and the assessment of learning 
outcomes of the participants, there is serious problem 
with; this approach. , ^ c . ^ 

The problem is that it would take nuc^h longer, than the 
^vailable time to carry out such /an assessment, perhaps six 
to nine days on the •average -rathfer than the three which ar$ 
available. * The persons .who comel to shbrt courses often do 
so only by making sacrifices wittrllff^an already very demancjing 

^ schedule. Furthermore'^Xthe instructors need to spend most 
of their time 'and ^ffbrt dTt instructional activity, not in 

I administering a massive test which takes days to complete and 
even^more days to score. .Moreover, many participants may 
have attended; for purposes *of ^acquiring an overview to the 
content and methods of the course, not to become proficient 
in the actual design of drainage and storage systems. In 
addition, most participants would probably perform poorly ot 
such a comprehensive test.* Fully competent performance on 

-114- 



135 



these tyt^es of complex problpms is not expected as an ' 
imineJiate outcome for the three 'day short course. The- - 
limited time available also makes the con^p-letion of massive 
V amounts of homework or major test problems^ by participants 
not feasible. The correction of massive anounts of complet.Bd 
homework by instructors alsgj not feasible. About all 'that 
can be done to work tHrough some good illustrative ' ' ^ 
demonstration problems and to complete parts of '.'-homework - • 
pr6blems under the supervision and assistance of the course 
instructors. How then, could ine-design a test to de^temune ' 
if the learning outcomes for the 'course had been- achieved? 
In what ways might test items' or tasks be developed which" 
would be effective ineasures o^jperformaHce. in key > areas, but 
^vhich would be able tp be administered and scored in a mattel: 
of a few minutes rather than several hours, or' days? 

Abbreviated Embe dded Test Tasks An Illustration ^ . , .', \ 
^ Appendix B contains a sampl^ multipae choice fe^t vhich ^ 
was designed for the hydraulics of o^e^^ chanxjels ^a^ction' of' 
the "Hydrology and Sedimentology of 'Surface Mined Lands" - • ^ V 
course. This is one of the six Aan,its of instruction in th^ 
three day course. The total test length, for this ^nit ik > 
eleven- items. The time -required to 6omplete the' test' is 
approximatelij ten minutes. Yet the test i^ems assess, most • 
of what is . of interest with r^eapect to the. achievement o^ . 
speqific learning outcomes at the end of the unit af^ ' " . 



-115- , 



ERIC 



136 



instruction. Similar tests of from five to ten items can 
be prepared for each of the other five units of instruction. 
^ These x:&n be used at the -end of eac^h unit of instruction in 
the course. A .time limit of -ten minutes can be set for each 
test. Participants can complete the test immediately after 
a -unit has -been taught. Because of ^the 'objective nature of • 
scori-ng, the participants' responses can be corrected 
immediately with a scoring key, requiring no more than ten 
to f if teeh 'seconds per "participant ' s paper, "rhe results of- 
the performance on the test by participants on an individual 
-jDasi.s and for the whole group can be reported' to. individual 
participants within minutes of the test administration. , (Mpi 
details about 1^ to carry *out this procedure are presented 
^ in Chapter 13.) Common points of misunderstanding can be . 
note4. by instructors and remediation or qorrection attempted' 
in subsequent instruction and dicilogue. The results frofti 
all such , short embedded te^sts can be added together across 
the six units of the course and reported* as an indication of 
an end «f course learning outcome for particular students. 
Results can be summed and averaged across students to make 
inferences about th^ success of the course for purpbses of ; 
formativA or sumraative evaluation. " ^ ^ 

In addition, duplicate or parallel Items can be written 
for each of the items in the embedded tests^for each 
of the units. These can then be assembrled into a pre-test, a 
iSbst test, or a delayed post test to s'erve_ other purposes. 

-116- 



The point to all of this is that embedded test tasks are 
often ejfcellent learning assessment devices because of their 
directly parallel structure to the instructional activities - 
and tasks and because of thexr practical nature. Yet they, 
are difficult to administer, score, and use to instruct 
Students about errors in learning and areas needing further 
attention, especially in courses rfperatiog ander severe ±ime 
limitations. Because of these factors, short objective type 
■test items, consisting of abbreviated performance tasks, 
similar to the eleven items presented in Appendix B ought to 
be designed and used more often. How may such short but 
powerful objective typ^ test items be developed? Perhaps 
examination of the purposes and .properties of the eleven 
items displayed in Appendix B 'can help answer this question. 
Not only is the example test provided in the Appendix, but 
^detailed instructions about how to prepare abbreviated te^t 
tasks in general are provided. Additional • information about' 
how to design and use efficient abbreviated test tasks is 
also found in Chapters 10 and 1,3. 

Purposes and Properties o^ Abbreviated Embedded Test Tasks 

Appended to the set of eleven t4s.t items in Appendix H ^ 
are a group of figures and tcibles. These .ar^ presented to 
the students along with the tei^t items. The hydrology and 
sedlmentoldgy course teaches the proper approach and " 
solution of problems through the use" of inany tables, nomo- 
graphs, and similar prdcedures^ An important part of the 

\- -117 — 

9 

. ' ^' 138 



task for the student is to learn how to use such material as 
well as to discriminate from among the many tables „arid figure 
those which provide the information needed for the solution 
of a^airticular problem. By having this series of gaaphs, 
figure*, and tables in one place as an appendix to the test 
items, the student must discriminate among the multiple' 
displays to. select the appropriate ones for the solution of^a 
problem part presented in an individual test item. In 
addition, the student must know how tb* enter the table or 
graph and retrieve the information needed. Thus, the test 
items a_ssess two skill areas which are key objectives' in the 
course. Each item simulates part of a real problem situation. 

The items for the most part-do not require computation. 
Items one through three test for basid knowledge of basic 
prpperties and relationships required to solve this class 
of problems. Items four through eight are all..cbns true ted 
around one common problem situation concerned with the design 
of a particular open channel given Certain performaace 
specifications, soil type, cross section, and ^lope.^ Items 
four and five test for knowledge of -how to enter the cprrect 
tables an4 extract the correct values for two design 
variables given specified problem conditions. Item six tests 
for the 'student '8 knowledge of a^n estimation procedure and 
the-use -or-fhe procedure to'^ double J^heck^oliTheTnitial values 
obtained for channel specifications. Item seven tests for 
the student's ability to Vecognize the reasonableness of a-, 

-118- 

.133 



9 1 * 

result bascJ u.joji a short cut estLaation procedure. Itum 
eight is the first item to actually fequire any computation. 
Its correct solution 'requires the individual to use 
information presented in the original problem statement 
preceding item four and the additional information given in 
item six. It requires the individual to actually calculate 
the top width, bottom width, and the depth of the channel 
allowing the necessar^ freeboard. To answer this question- 
• properly, one must know which variables to attend to, which 
computational algorit^s to' use, and again, be aware of the 
ifange of resulting values which are reasonable given the 
probl€:;n cljaracteristics. jtems 9 through 11 are another 
series of similar, questions written around a cpmmon problem 
situation. These three items test for kiiowle^Jge of 
proceGure, use of rules of thumb to modify models and 
computational procedures, and checks on estimation , procedures. 
The skills tested f or ^by these three items are similar to 
those tested for in items four through eight, but the 
probJLem characteristics haVe changed. 

. Collectively the .eleven items test, quite well, the 
level of competence achieved in ehe basics of solving these 
types of problems. Yet the time" required for this testing 
. is small because of the manneir-in which the tasks are 
prpented in the items'. • The emphas^ is not upon calculatibn. 



^ _____ , ^ 

but upon knowledge of procedure and proper practice, although 
actual computational skill in determination of design 

• -119- 

:• * - - Mo 



specifications is also tested. In addition, there is a 
hierarchy of difficulty in the test i'j^e^ns. The first two 
items test for basid concepts and knowledge. Later items 
t«st for knowledge and skill in the use of proper procedure. 
The last items require integration of the knowledge and skill 
required in all of the earlier items and actual computation, 
of Resign specifications given different characteristics in 
the actual -problem situation. Consequently, the results of 
-studen-ts' performances across items tells something about 
what parts of the intended learning out/comes ^they have 
achieved well or not so well. Thiff asjusefiil information 
for. individual learners ^in order that ih^"* nay 'correct or 
further develop their knowledge and ski]/! ih needed areas. 
It is also information .very usef.ul in/the summative* evaluation 
Of the effectiveness of a given course Wtven the pattern {5f 

correct and incorrect responses is examinedXacross all 

\ 

students. If there are problems in teaching certfeiin 

• ^> ♦ 

procedures or methods in the course of instjDuction, these 
are likely to show up in the test results on specific items 
or clusters of items. 



Advantages of Multiple Choice Items as Embedded Te^t Task? 

It should be clear from the example items in Appendix B 

^ ' \ \ . 

and the accompanying text that multiple choice items ckn be 

written to test for. very complex and high level skills well 

as for recall of factual information.. Many times^multipl^ 

. / -120- ^ * • 



111 



choice test items are written oi^y at the fac'tual information 
recall^ level. A test composed of such items does not provide 
information on higher level capabilities such as skill in 
formulation of a problem, using computational procedures 
correctly, applying estimation procedures, and integrating 
information from several different sources to make informed 
decisions 'about how to proceed in the solution of a given 
problem. 'i Tests composed of items similar to those listed 
in the sample tfest in Appendix- B are much more valid for ' 
. . such purposes than are tests composed of more typical factual 
recall items. 

Tests similar to the one -displayed" in Appendix B are ' 
also more ef f icie/it than other types of longer embedded • , 

✓ 

performance tasks. The reasons for this are that they can 
. be completed in much .less time,^can be sc^jred very rapidly, 
with a standard Scoring key, the results can be tabulated 
'and presented to the learner immediately after the test is 
^mpleted (see Chapter 13), and, if properly constifucted, 

* • • 

they^ are good approximations to more complex, longer duration 

», * • # * 

performance tasks and work , samples. 

Preca uti ons in^ Developing Multiple Choice and Other Objective 
Te;st Items , . . • 

WheiLdevelQping_bests^.similar-4;a-^the-one-d^ . 



Appendix B there are a -liumber of .precautions which should be 
observed. First,' it is best-to develop the individual items 



-111* 



from full blown, actual problem situations or taslis. Vhat 

is, the actual complex and very time consuminq^ problems which 

are usually assigned as homework problems or which are used' 

as laboratory or demonstration activities, ought /to be the . 

basis of the individual* and abbreviated multiple choice te^it 

items which are .generated. . One should start with the real 

^ problem or problems representative of the performance domain 

•.which IS being taught. Then one should wbrk out these actual' 

problems in full. Then one should ask, "In what^^w^ys might 

• I break the solutipn of this problem down ini:o individual test 

items of a multiple choice format? The task" is to ca^pture all 

'the critical- aspects of the decisions and judgments which 

persons must make in the solution of real problems i^ all of 

their complexity with a small number- of test items. 

Different parts of thi-s decision making and judgment can often'- 

be captute'd in different multiple choice test items'.written ^ 

about one common ' problem situation. Certain information can " 

be - given so that' the perspn must not^ necessarily do 'all the 

'actual work required to solve a real problem of the type • 

under consideration,- but so that he or she mu'^t Siscf iminate, 

the relevant variables, apply knowledge of procedural rUlea, 

select appropriate models, modify computational 'procedures 

according .to problem characteristics, and so forth. Beginning'. 

^ -.**,. 
' With a real and complex problem,' typical of the type 



ERIC 



ejjjicountered in the real domain of work, 'helps insur^.that the ' 
test 'items written tp assess knowledge and skill about. ' 
specific areas of* procedure will be significantly relatferl to 
^ , -122- 

113 ' 



intended loarninq outcomes. Ccaiectively the itens shoald' 
be a r'oa.sonable tost of the breadth and depth of the person's 
skill. Basing, test items upon real problems also insures 
that inferencl^s made about the learner's achievement .from . 
test scores will be reasonably valid." 

If oflr begins the,other way, by writing only those 
multiple -choice best items one happens to think ..about, 
without referencing ea<.h item in some actual complex probl^, 
one is likely to generate many factual, recall and simple 
.information items. .In additioii, the total' test comprised of 
the assembled items w'ill of teit n6t be very broad or test for 
depth of skill across areas of complex performance. Just' as . 
is the case io the selection of tasks and activities \by whiclr 
to, instruct persons and provide them With practice in learning 
complex ^performances, -so too in test* development i^ is best ' 
t9 begin with "a typical set of complex problem's >Gf the typ^; 
•likely to be encountered by the practibing engineer.' Once 
these problem types have been ^selected, it im possible to 
break, them down into smaller, components for purposes of . ' 
instruction 6r lor testing 'of Student' achievem^t. Chapter 
10 t^rovides detailed procedures for insuring that b^th test ' 
Xtems and instructional tasks., are sampled from the full 
performance domain to insure validity f^or each. Coi^siquently , 
no more will b^ said about 'this--methOT3o^lx3gy--at"^il-sTpdih-t-: 



-123- . 



Another- caution Is necessary. • For, mithodoloq-ical 
reasons it is i;npOrtant that the individual test items bo " 
logically independent from one another with respiect to a ^ 
correct answer on" one item, being requited ^or the correct' , 
answering of other questions. "That is, one does not Wish to 
E£event the student from being able to demonstrate/ knowledge" 
or skill.;on subsequent test items because,-^e or she. obtained 
a wrong valueTjT^^onse to__an,^art:I^r item. Consequently, 
later items should not require correct responses to earlier • 
Items as a coadition of their coirVect solution • 

This does.^not mean that multiple sets of items may not ' 
be written about some, common problem situation. Nor does it 
mean, that information needed to solve a later problem in a 
series of such items, cannot accuihuiate in subsequent items. , ;" 
It ^oes mean that when sucl^ information accuihulates it should 
do so in the item stems, and not in the options whiph are 
presented as Jlternatives from wliich the student is to choose. 
Items four through eight in the sample test in Appendix b 
illustrate this point nicely. All four items depend for 
their correct solution on information presented about a « * 
giv^n problem. This informatidn appears in a stem which ° ' 
preceeds all of ^he items'. This coranon stem appears prior 

^^^ j'^^^^our . Students Jare^ told they will need to use this 

information for the next four items. To help the student 
remember ifhat this t)lo6k of items goes "together, all four ■ 
^ ■ " -124- • • 4 



115 



\ 



\ 



^ questions 'a^a the common item st^ are enclosed in a bracket 
on the left margin of the test bookUt.". Furthermore,- 
information presented in the- 8^;m of item six is required to 
correctly answer item seven/ However, in no case'^oes any" 

. item require that a student answer any previous • item 

correctly if he-or she is to^obtain the correct answer to a 

^ subsequent item. 

It is important to maintain such item indepeH^ce: If 
an entire series of items depends upon the correct^nswer to 
an earlier item, all of the remaining .series will be 
incorrectly answered if an ^ror. has occurred in response to 
that first item. The subsequent test items are ^'not valid 
indicators of what the individual knows or" can do under such 
a r^itriction.^ The value of any individual test item ig to 
assess sqme:,pk|^ticu^ knowledge or - skill . Through/multiple 
itenis designe^^4^sess multiple areas of competence at ' ' 
various level s^^P^ kill', -it £s p6ssible to ^Take strpng " 
inferences about the extent' ^1 learnii^of the individual ' 
based- on the total test score: ; if it 'isN^f interest to ' 
determine if the student can correctly work' through a complex 
/^series of procedures,- where in'real ,life the correctness of 
the hext'ltep depends;^pon th^ accuracy of therpr4vious 
smps, this can alsa^ tested t ^ How^ve^, it should be teste^l 
in a separate and more complex, item. . Item eight as well as 

Items-rri^re-eiTrmiWelev-^^^ on-lhe"~": 

-125- 



ERIC 



116' 



sample, test in Appendix B, This complex type of learninq 
outcome is best tested for by a perf ormmce ,task which 
simulates a complete real world proMejn. 




Generalization of Item Construction- Pro^^educes to Other 



Test Formats ^ / ^ 

. It should be clear that one does not n^ed to use 

multiple choice items to, achieve the objective of ^having a 

\ * i 

short hut powerful test by which to assess the degree of 

learning of students. Each of the multiple choice items 

shown^in the sample test in Appendix B can be used ts a 

constructed response test it^inT That is, each stem for each 

multiple choice item may be u^lidi without any of the 

\ 

distractors and the correct answer which presently comprises 

the options from which the student is to select • If the 

options are omitted the student must simply construct the 

correct answer. Provided the^item^ems are written to call 

* 

for particular types of judgment, skill/ concepts, or 

applicatior\ of procedure, the constructed answer can be short 

"and can ,also. be easily scQred in an objective Aanner .^^here 

is also an advantage in using the constructed response formal^ 

in that there is less possibility of guessing producing a 
♦ ♦ 

"correct" response. On a four option multiple choice test 
consisting of n items, by chance alon# a sooee pf .g5n will 



occur across persons on trial admitiistrations. ' One 
compensates for this by insuring that there ^e enough it^ms 



-126- 

117 



to test a broad range of knowledge and skill artd by setting 
an acceptable level of performance well above the chance 
level • Various methods exist for correcting for guessinc: ' 
on multiple choice tests* These involve subtracting more 
than one u^it value of incorrectly marked item^ from the 
total number of items marked correctly, but not doing so fo.r 
items the student left blank. Generally, however, it is 
better to include enough items on the test and to set a 
criterion well above the chance level than' to correct total 
test scores for "guessing" when using multiple choice 
formats* * 

If the class is small and there is time to administer a 
test- similar to Bie sample test in Appendix/B it is probably 
better to do so as a constructed response test. The 'scoring 
of theke 'tests takes more time and intelligence. However, if 
"the class is large, and time is at a premium, it is probably 
better to use the multiply choice format. This is because 
students may place their ansv/ers on^a standard answer sheet 
which may be scored immediately with a hand Scoring key or 

ma^chine. This is a very' rapid and accurate way to proceed 
and insures that stucjents can receive the results of their 
knowledge assessment ' right away, a basic requirement for 
good use of t^sts fpr instructional purposes, 

WJietK^~t"Re~reF|!PF6nnat~is7in^ or short 

answer constructed response, the most important consideration 
is to have well d^i^loped test tasks in the form of questions 

-127- 

- ■ ■ > ' lis . ' 



or item stems which have been sampled properly from the 
performance domain beinq tested and which also li ive been 
de-siqned to te'st for specific skills or knowleda^-. -^sts 
d'psiqned to meet these specifications make excellent em};ed<lel 
test tasks. They may be interspe^ed as short quizzes at , 
key points in the ins'tructional '£ .quence. ^ Tl^ir use takes. 
litt^.e time and reveals a great jJeal to the instructor and 
the students about the achievement of particular learning' 
outcomes. For additional examples of how to develop tests 
w^ith these characteristics, the reader should turn to 
Appendix if and Chapters 10 and 13. Well designed abbreviated 
test tasks are particularly important in short courses where 
there is. inadequate time to use the assignment; of homev;ork 
problems and laboratory activities whicli are the embedded 
test tasks most frequently used in traditional.,- long term 
courses in technical fields. 

^^^Z^LL^j^jy:^^ Co nstru ction Procedures to Pre-, Po.st 

' ' ' ■ " — • — —I r _ ti'- 

' .Delayed Post Test Construction 

, Most of the details and advice presented in this chapter 
Goncern^^iig how td go about selecting and developing good 
embedded tes.t t^sks .and items al'so applies to the construction 
of pre-^est, post tost/ and delayed post test itoms. To the 

^^^^ ^^^^ items and ta^ks are developed along with 
instructional ta^s and learning activities, as Ixith are 

sampled from the common domaii) of expected performance, the- 
* * . 

-128- " 



test items, will be better than if otlierwise produced*- This . 
is because-- test items are actually s:)ecific taskS \,{jTich "ari 

t 

reserved for careful observation, of student performance. Prom 
the observation of student performance on this sample of 
tasks, the instructor makes inferences about the degree of 
student learning of specific knowledge br skill. Any one test 
item is not sufficient to making inferences acrass the entire ^ 
domain of performance. Rather, individual items should be 
constructed to determine the' student's level of skill or 
knov/ledge in specif i'c critical aspects of the performance. 

» « 

Tes-t items must be properly representative of the range of 
knowledge and skill required for effective performance of the 
complex activity being taught. Student per;formance across a 
number of well developed items of this type can be valid 
measures of .the degree'of student learning and tke success 
of instjruction. - • ^ ' • « ♦ 

Because test tasks or items are so closely related to 
the 'topics, materials,^ and actftvities~~9^\iry54:*ruction, it. is 
best deliberately to develop test items at the same time one 
is developing' instructional tasks and .activities, a point 
repeatedly made throughout this book and a procedure described 
in detail in Chapter 10. It is also best to use these ^ ' 

assessment tasks during the course of instDuction as embedded 

assessment procedures to^ inform both, ^.earner and tiTS-tructor 

i 

of the progress, acdomj^ishments, and problems of individual 
persons in order that errors may be corrected and areas not 



■1? 



■.429- 



ERIC 



150 



in need of , furthe? instryation be omitterl frbn further 
ins4:ructionai activity. If such procedures are followed / " 
one. develop:-! "a ia'mje pool.- of assessment tasks arid items., 
Asserabl^gos of thesfe test tasks or items may he prepared 
and used as time efficient and yet highly valid "pre-test^ ! 
post tests, ^d delayed post tests., , • 

Conc lusions 

Traditional forms of embedded tests including quizzes 
and homework problems are common instructional methods. It 
is possible to abbreviate complex performance tasks into 
short and efficient test items by- which to make sou'nd,- 
inferences about how* much' and what students have learned 
following units of instruction; it is best to develqp test 
task^ at the same time the instructional activities are being 
developed for 'teaching the cours^.- Both test , tasks and 
instruCtiohal tasl^ should be developed from complete and 
typical complex problems the engineer will face. in the work 
setting. This helps to insure that the' test items which 
are 'developed, as we^l as the practice problems which are 
ass4^ned, will consist, of a range of difficulties and include 
a hierarchy of concepts ai^d skills'^'^bquired fqr effective 

taken to insure abbreviated - 

y ' ' ' 

testa fnclude items of different difficulty and ■'that 
collectively the test is a good assessment of the range of 
content knov^ledge and skill which is desired. 



V 



-i30- ' . i , 



The procedures which, 4^ply to the development of good 
embedded test items also ^pplv to the development of test 
items for pre-tests , post tests, and delayed post 'tests. 
The procedures also apply to tests of different formats-, 
including multiple choice, essay) short answer, and proble 
completion, types of items. The main purposes of abbreviate 
embe<3ded tests are to help diagnose learner achievement and 
learning problems, provide information on course effective*^ 
ness usually in a formative way, and serve as a part of the 
learning experiences and activities of students. 

Although embedded tests are most useful in providing 
information about how much' and what students have learned or 
not learned immediately following a unit of instruction, 
the information they provide can also be used to make 
inferences about the ^ achievement of students in meeting the 
overall learning outcomes set as objectives for the course. 
This can be done by keeping a record of performance of 
individual students across all of the embedded tasks for 
eacV unit of work. Collectively this record can be used 
to draw a learning profile for individuals and groups. 
Under typical situationg the profiles of students may be 
expected to improve with time and toward the end of the 
course as more information, concepts, and skills are 
mastered . 



-131^. 



■'^ , . - 

Information of this type is also useful in dete^tmining 
which parts of the course are most effective and which part:5 
need to be improved. Programs or courses of instruction art? 
almost never totally good or bad. They are usually a 
mixtur^e of effective and less effective learning activities 
and instructional methodologies. By being dispersed 
throughout the course, administered after each unit of 
instruction, and because *they. are specific to well defined 

4 

sections of courses and particular: concepts and skills wit i n 
these sections., embedded tests provide diagnostic informat um. 
This information is useful to the students in the course who 
learn right away what they do and do not understand. It is 
useful to -the course instructors in immediate modification 
of instruction to better meet the needs of students. In 
• addition, it provides insight into which instructional 
achivities and teaching methods are working well and which 
ones need to be improved. For all these reasons, embedded' 
learning assessment tasks are very important tools, especially 
in formative, evaluation activities. 



i 

' -132- 



^ 153 



Chapter 8 • 
POST TESTS, THEIR PURPOSES AND USES 

No matter how adequate* the embedded test tasks may be 
in * courae or unit of instruction, typically some indication 
of student*.' end' of course achievement is needed. This is " 

/ 

a suramative evaluation function which serves to inform the. / 
learner, the instrua^r, aiid appropriate others '(such as / 
employers, professional certification agencies, and / 
governing boards for continuing education units) , the ' 
amount of student learning at a particular point in time. 

Cumulative Nature of Post Tests 

A post teat should be cumulative* across the various 
levels apd sequences of akiUs and knowledge instructed in 
the course. That is, there should be items or tasks which 
test for knowledge «nd skill from all portions of the course. 
Items should be included requiring: «) r-eqall of Jiinple 
,infonn^tion, b) recognition of correct principles, c) use of 
approESriate concepts and procedures, d) formulation of problem- 
situations in terms, of specific and appropriate sequences 
of ^activity, and ») correct technical solution of complex 
•problems. If the polst test measures only recall of facts or 
recognition of procedure, nothing can be inferred froiq the 
tftst scores 6ther than students' capabilities inthhese areas. 
Consequently, care must be used to. assemble post tests which 



include a proper range of tasks across the major levels of 
skill and knowledge required for ef feictijfe_.p^5rmance in the 
area being instructed. 
. ' Suppose one is teaching a short course- on the use of 
■^microprocessors 'in automated controT of industrial machine 
processes. ^ Now suppose a post test is developed. "^ The tes-t 
items require naming the typical components of such system^, 
recoqpizing proper and improper logical- steps in programming 

t 

such systems, and naming the producers of • certain types of 

equ^jpment needed to implement microprocessing controls.- 

After the test is administered, what types of inferences may 

be properly drawn from examination of students' performance " 

scores? Assuming that the test iteahs are all*valid and 

properly constructed, inferences can be, made only about the 

ability of students to name the components of such systems, 

recognize proper and improper logical- steps in programming, 

and recall the names of equipment producers. What is being 

tested. is naming, recognizing, and recalling certain facts, 

principles, andnauies of companies. In no wa^ can 

performance on such a test be construed a<s demonstaating 

mastery of actually planning, assembling, and integrating ' 

microprocessing equipment into industrial machining 

production processes. This test is perfectly appropriate if 

the intended outcomes of the course are fahe naming, 

recognizing , and recalling goals. However, if the objective 

for the course is the performance capability of integrating 

-134- 



microprocessing equipment into- industrial machine production 
processes, the post test is very inadequate as an assessment 

One way Ground this problem is*to select test tasks to 
represent the full range of the instructional tasks developed 
for the course, a procedure suggested in the last chapter on 
embedded test tasks and described' more fully in Chapter 10. 
If one has a range of items parallel to the various levels 
of instructional tasks in terms of 'knowledge arid skill ' ^ 
required, a good post test can ea-sily be assembled. 'But if 
one has the embedded test tasks and one has used them all 
along during the course .of instruction, why develop and use 
a post test at all? There are two reasons why a post test, 
■akes sense even if embedded test 'tasks have been used 
throughout the course of instruction* 

Revealing Cumulated Learnincy 

A simple summation of a student's performance across 
all of the embedded' tasks may not be an accurate assessment 
of hia or her competence achieved at the end of the course. 
The reason for this is that embedded tasks are useful 
.primarily for providing': the student' and the instructor with 
information about what is being learned well and what is not 
during in8t-rutiW:ion. The results of individual embedded test 
tasks often reveal weakness in a student's learning, flaws 
in the instructional methods, or inadequacies in the 

-13 5- 

... 




Instructor's ability in teaching particular sk'ills, 
procedures, or concepts, when these problem areas ai;e 
identified they c^re usually corrected. When embedded test 
tasks or items are used in this manner they contribute 
greatly to effective teaching and mastery of key course 
skills and knowledge by students. This is a formative - 
evaluation function . Information is provided by which 
future instructional activities of the learner and the 
instructor are modified bo achieve mastery of course content. 
Thus, students who initially do poorly on specific test 
tasks or it6ms embedded in the sequence of instruction, 
" should*"master the areas in which they are having difficulty 
by th^ end of the course. There is a great deal of empirical 
research which indica^tes this is the typical pattern in 
courses where embedded test tasks and their results are use^ 
in a formative "way. The Personalized Sy.stwn of Inetruction 
Method (PSI), sometime* call^ the Keller j>Un, operates 
precisely this way. PSI has proven extremely effective for 
the instruction of engineering and other technical and 
scientific courses (Cleaver, 1976; Ericksen, 1974;, 
Heimback, 1979; Kulik & Kiilik, 1975; Kulik, Kulik, & Cohen, 
1979) • . . . ' 

For this reison, a simple summation of all of the 
embedded test tasks performances of students is not aclcurate. 
It provides too low an estimate of students* ^xit level 
learning* Because of this, nearly all PSI and similar mastery 

-136- ' -IT 



learning approaches to instruction typically provide some 
final andyeomprehensive test of performance. These tests 
usu2illy consist of items sampled from across the entire 
coi^se of -instyuation* Th^ items are parallel but not 
identical to the items used in the embedded tests during 
the course of instruction. This ^insures that students are 
requi^red to learn genera'l skills and principles- and not 
simply the array of correct responses to a particular^ set of 
items, a possibility if the same items were used on both ' 
the embedded test tasks and the final examination or pbtft 
test* There' is much research on the effectiveness' of 
instruction which makes use of embedded test tasks in a 
formative way coupled with the use of summative post tests. 
Where the focus of instruction is upon mastery .of complex 
skills and performance capability these tbsting procedures 
ar6>very effective (Gagne, 1962; Grogan, 1979; Kulik & Kulik; 
Kulik, et al. , 1979) . ' ' ^ 

Measuring Functional- Integration of Knowledge and Skill 

The second reason for use of post tests is to provide 
an assessment procedure by which to test for the student's 
ability to integrate and use wisely ail the component know- 
ledges and skills which have been taught in a course. On 

the basis of individual tests. of separate components -of t^e 
^otal performance, it is Aot pbssible to infer that persons 
have learned to put all of these components to9ether into a 

-137- 

158 



skilled performance involvinq the Ablution of a complex 

px-oblem. I,f one wishes to test for skilled performance in 

integration and "wise use of the. range of skill and knowledge 

reqnired for solving problems of particular types, cJne n'eeds 

to develop test 'items which have tl;iis particular requirement 

for their correct completion. Again,, increasingly 

complex embedded test tasks, of the type included in the 

Hydrology and Sedimentology course referenced earlier 

(Haan & Barfield, 1978) and similar to the sampi* test shown 

for one. unit of that course (see Appendix B) , can do |^uch 

to assess the student's ability- to perform the integration of 

particulars. Good embedded. test tasks cumulate across /the 

courme from' test tasks of simple and parlficular knowledge 

and ^kill to very complex test tasks requiring discrimination, 

judgment, integration, and wise use of a large amount of 

facts, concepts, procedures, principles, and, theories. 

i i ' * 

Performance-based, engineering education approaches are 

\ 

designed in this manner. A series of success'ful project 
activities and assessments of complex performance capabilities 
are scheduled throughout a student '^s ppogr^. 'The complexity 
Of these perfomJance tasks and project^s^increases systematically 
as the student progresses (Grogan, 1979). if this is so, why 
would a post test which tests for similar ability to integrate 
aad u.« course knowledge -and skill wisely and well be needed-? 



. . -138- 

\ 



The answer is ^imilar to an earlier answer concerning, 
the use of post tests. Even complex embedded test tasks are 
more intended for practice and instructioi\al diagnosis of 
what students need to learn. and what they have already - 
learned at particular points in the course than they aire for 
suinmative evaluation statements- about the overall amount of 
learning resulting' from the course. Even the most demanding . 
and complex embedded test tasks are typically practice 
Situations. They are usually administered in situiitions 
where the student may ask for and properly receive assistance 
if he or she has difficulty on any paint in* the problem. 
This is appropriate because the task is seen as a ^ay to 
diagnose what the student can or- cannot do in the interest of 
achieving mastery of tj^he course content. Both the instructor 
and the student need to knqv what must still be learned to ^ 
vperform some complex task correctly and efficiently. 
Collectively the record of an^ individual's performance* on 
these^ types of embedded test tasks can reveal much about 
the student's rate of p'rog*ress through the course. * Sometime^ 
it becomes clear that a ftudent is having great -(difficulty 
and progressing too slo^^. If a student needs^ too much 
time and extra assistance to master material other students 
can master much more quickly, it iDecomes a serious drain on 
limitfed time and teaching resources. Likewise, a record 
of individual student performance on embedded test tasks 
also reveals which students ace highly competent and rapid 

-139- -s^ ' _ 



ICO 



learners of the cdurse content. Yet when all is said and 
done, it is usually desirable to h«ve an' independent estimate 
of the student's learning at the end d<a course to sura up 
the overall degtee of learning. The resuHs of erabddded test 
tasks and these ^ final assessraent tasks raay be\combined and 
u^ed^as an overall estiraate of the learning reslkting\f rom a 
course.- The reasons for needing this suimnative dkta have 
been raentioned before. They include the needs of the learner, 
the instructor, the eraployer, the professional credential ling* 
agency and others legitimately concerned with the degree of 
learning achieved by the professional engineer at the 
conclusion of a period of training in a continuing education 
course of whatever variety. More will bd( said about this in 
Chapter 13-. " ,, 

Summary of Main^ Features of Post Tests 

It is apparent that good ;post tests for continuing 
eduaation courses must havd three" nialn features. First, they 
should be representative of the breadth and various levels of 

s. Second, they should test the 
ability of students to integrate all of this knowledge and 
skill wisely and with technical accuracy in the solution of 
complex problems. Third, they should be short, require oiily 
a small amount of-tlifte to complete, and permit relatively 
quick and objective scoring procedures. It is for the third 

4 

reason that the testing of complex- skills by the military, 

-140- 



\ 



161 



the government, and m&ny industries has come to rely upon 
multiple choice and similar short answer objective items. 
Tests developed in this format, which follow the procedures 
laid down in this and tKe other chapters, are likely to 
be good estimates of the degree of learning outcomes resulting 
from cpntinuing education courses. They are also likely to 
be efficient in terms' of , the time required for administration 
and scoring. In addition,, if used properly; ^hey are likely 

to be appreciated, rather than objected to, by enrol lees /in 

continuing education courses. Persons who are learning 
complex skills and procedures usually welcome the, chance to 
demonstrate their newly developed competencies. A large 
majority of the professional engineers , surveyed in a number 
of continuing education courses at the University of Kentucky 
and elsewhere have indicated they have no objection to 
being tested (^Ferry^ 1979; Moss et al., 1978). 

Adequate versus Ultimate Summative Evaluation 

Post tests are needed to* provide independent and efficient 
assessmenti^ of the degree of learning which occurs at the end 
of a course. These assessments ate only estimates. The 
dfegree to. which the estimates are valid and accurate depends 
upon how well the tests have been constructed: There is no 
substitute for 'the assessment of complex performance, and the 
products which result from that perfoxrmance in actual work 
settings. The ultimate summative evaluation of a bridge 



design is how well an .afctual bridge built- to' the design 
specifications performs over a long period of time, perhaps 
80 years or more. Yet it is not possibly to test the bridge 
design by waiting 10, 20, or 8 0 years to see how well it ' 
actually performs | Rather, one must take what one knows 
about bridge design and construction generally, design small 
models and simul^itions which test aspects df the proE>bsed 
bridge design, ^nd finally construct inferences on the 
basis of these simulations and the general fund of knowledge " 
about how to actually build bridges'. * Then bridges are 
built, even though the ultimate evaluation has not occurred. 

So it is also in the design. of courses to teach persons 
complex skills and performance capabilities. One cannot wait 
and. see how well the performance capability has been learned 
in a loh^^erm aense by. .course participants before teaching 
the Qourse. again anymore than bridge builders can wait for 
t.he ultimate full performance test of a" bridge design before 
constructing additional bridges^. In both cases there is too 
little, time tp perform the total and complete sufenative } 
evai*ation» ' • * , 

Even tjie supervisors of -professional engineers sample - 
only partus' of the engineer •s aqtual performance for close 
examination. There is simply not time to obse'rlve the entire 
performance. In. addition, the supervisor samples, for . 
detailed ex2unination, only some of the products .of the ' 
enginefer^as work. ^ The beginning engineer is watched more 



b ICQ 



closely, the highly exR.erienced "and s'killed 'engineer very 
little. It is even less practical for the. continuing 
education instructCr, Or the direbton.of a continuing - 
education course, to monitor clo.sely the actual on-the- 
job perfgrmance of engineers who have completed- training in 
,part^ic<JPlar short courses and other types of ' contiinuing ' 
educa^on experiences. All that can be done is to construct- 
reasonable approximations to some of the more critical 
features of performance of some complex procedure. These are 
usually abtjreviated simulations of aspects of real .tasks and 
problems which will be faced by thf engineer in hi8»or her 
work activity. These tasks can be test items similar to ' 
those displayed in, Appendix B or some other form of ' more 
priactical examination as is often given in laboratory or 
project "assignments in technical courses. From these types 
of assessment tasks .It^ is possible to make inferences about 

how well participants are leaiinirfg th^ skills and knowledge 

i 

which are the intended J.earning outcomes for specific 
continuingj education, courses. 



-143- 



% 



. • • ' Chaptet 9 • 

DELAYED POST TE§TS AND OTHER ASSESSMENTS 
OP LEARNING OUTCOMES 

No matter how «xcellent are the results of the initial 
tests of a new bridge design which uses nW construction^ 
materials and methods, -the persons^ involved are* keenly '| 
-interested in the actual perforjnance of 'the bridge once it is 
in service • Consequently they 6bsei^ve.it closely, note any 
prpblems, and modify designs for future bridges. There is a 
parallel for persons who develop and- operate continuing 
education courses. No matfcer *how popular the course, or 
how positive the initial suramative evaluation, it pays to 
monitor the performanoe of persons who have completed the 
course to learn more about the need for the course, the 
degree to which. course conteRt and t>rinciples are being used 
in sound and proper ways, and the general^ value of the course 
to the engineering population it is designed to serve. This 
constitutes the first goal for delayed post t*sts 9r other , 
delaiyed assessment procedures. The information fcom these 
follow-up assessments reveals not only what the participants^ 

r 

have learned, but also whether they are able' and willing to 

put this knowledge and skill to '^use in their daily, work 
activity. • - 

Complex Skills Improve with Time and Experience 

There is .another reason for the use of delayed posj: 
tests or ether typ^s of follow up assessment procedures. 



■. Ma^yof the outcomes for continuing education courses in 
' engineering are very complex performance capabilities. These 
, capabilities are based" upon the proper conceptualization of 
4Pany facts, relationship,^ and concepts; and upon the ability 
to apply many principles and theories to a wide range of ■ 
.^practical situations. Singe these situations are ve'ry 

diverse, many times modifications must be made in procedures", 
^ principles, and models if .particular problems are, to be 
so],ved' NO one course caL offer the range of problem 
situations needed to full> instruct participants. The high . 
levels of competenpe. in applying course content and know- 
ledge to the entire range of situations likely to, be - 
encount^ed in the real work setting come only with much 
appropriate experience. The learning activities included • 
in any course are also only partial samples of what must 
really be experienced and understood if performance in t^e '■ 
domain ..nder consideration is to become highly expert. In 
such situations the usual goal is tp make the atudant very' 
familiar witA the basic features of what a good performance 
is across a finite number of realistic situations. In 
addition, one- usually also tries to teach something of the 
background, theory, principles, and varialbles wfcich help ' 
^ the.;engineer understand better why certain procedures work 

well and others are not particularly suited to the solution 
> of certain types of problems. 



-14 5- 

ERIC * • ' ' ICG 



It has long been known that when skillful performance 
of complex tasks is the goal of instruction, quality of the ' 
performance usually increases after formal instruction is 
completed and the person has returned to the job or continued 
to pursue study in^^ related area. Simple facts and recall 
of information decresse rapidly with time after instruction. 
Yet, retention of principles, techn;|ques, and skills in 
complex performance areas typically increases and becomes 
even more polished long after formal instruction has ceased, 
provided they are used. 

Tyler (19'34) performed some of the early empirical 
7 studies which revaled this principle. He found that 
college students in a zoology course forgot 77 percent 
. of the factual material) auch as naming parts of animals, 
a yaar after completion of the course. However, there wa^ 
no loss in skill in applying, principles or rules learned in 
the course to new situations not encountered in the course, 
even a year after the completion of the course. In fact, 
Tyler's studies showed a 25 percent gain in the skill of 
Qs^proB^rly interpreting new experi»tntal designs in toew 
ateas a year after the course had been completed (Gage &~ 
Berliner, 1975, p. 143). 

The same results have been observed many times by other 
reparchers.. Complex skills, principles, and procedural 

ethods are difficult to learn. One cannot simply "look up" 
these skills And know how to use them. They come only with 




much effoj^t and guided practice; and they get better Vith 
time and experience after instruction is completed in the , 
formal* sense (Cole, 1972; G^gn^, 1962, 1965, 1974).- Reading 
is one example of a complex skill with exactly these types 
of properties. One cannot "look up" how to read. It is a 
complex performance consisting of many sub-skills which must 
be integrated into an efficient and smooth performance. 
These" types of complex learnings have been referred to as 
"process skills" because they are generalizable ways. of 
^processing information and solvikg problems. Once learned 
they ate<5£ery. resistant to forgetting or wiat psychologists 
call extinctioir><CheyJypi^^ long, and are 

used continuously. Like good wine, they improve with 
time (Cole,- 1972) . . 

Measuring Growth of "Process" Skills 

Because the skills being taught in many continuing 
- engineering education courses are of the "process" variety, 
it is important to determine if participants who haVe com- 
pleted the course some time ago are growing in the skills 
they acquired initially *during the course. Jf they are not, 
it probably means they are not using the information and 
■skill offered in the course in their work. This may be a 
good indication that the course is not^pai-ticularly needed 
or relevant. Of course, this may be true for some courses 
and not others. Again, it must be ramerabered that 
\. some engineers and scientists enter even' very technical and 



specialized courses not bece^use of the desire to use the 
information learned in their daily work activity, but simply 
biBcause they are curious or wish to become more broadly 
informed. However, if it is presjamecl that the ma^n reason 
for a continuing education Jourse is to upgrade specific 
^-knowledge and skill related to increased competence in some 

•engineering activity sfDecialty, failure to use concepts and 

\ 

\ skills acquired in a course by large numbers of participants 
^ over replications of a course is damning. If the course is 
central and vital to better perform'^nce, participants should 
at least not forget or perform more poorly on a delayed post 
test or other assessment procedure « compared to a post -test. 
Ideally and practically, they should improve in their 
performanc^, paribicularly if they are applying course 
principles and procedures frequentlv in their work activity.— 
It must also be remenibered thut participants will apply , " 

^lectively that which they have learned. All of the know- 

\ 

^ ledge and skill dcquired in a course may not be routinely 

used on the job. However/ these, portions of the course which 
, are relevant to the work activity of the engineer should 

h 

come to be learned very well . , 

Now that we have examined tv^o reasons ^or the use of 
delayed po^ tests or some other type of assessment procedure, 
let us consider other reasons for such assessments. 



-148- 



P"g£OSAs_lor Assess ing Delayed Learning Outcomes 

Formal administration of tests developed and used earlier 

as post tests is one method of determining how much of 

cou^rse content and skills is being used. and retained by past 

course participants. If such a procedure is carried out, one 

would not usually test all participants ff-om past course 

sessions. Rather, one would sample from among those persons, 

mail ^out the test, have, persons Complete the test, and return 

it for scoring. Generally it would be of little interest to 

the practicing engineer and his or her employer to. participate 

ia such del-ayed post testing as a means of assessing the 

individual's competence in complex performance areas. As 

was noted earlier, it is the daily observation 'of the 

engineer's performance and the products of the performance 

which constitute the practical evaluation carried out by the 

engineer, his or her colleagues, and the employer and clients 

for whom the engineer works. This practical evaluation and 

the f-acit understanding vhich it produces among all of these 

groups IS tHe raest. meaningf uj. and significant delayed post 

test evaluation. If a course ii found wanting by this tacit 

evaluation, proce<aure amorig these groups, it will not be 

subscribed in the futureand it is, in fact, functionally 

judged to be inappropriate -to needs of practitioners. 

Why then, woul^d it be advisable to administer a formal 

d€^layed post/ test? Why would participants of past courses be 

willing to engage in such an activity? The answers to these 

-149- • 



questions have to do_ with the need "to conduct formil summative 
evaluation of courses, and their effectiveness by those who 
design and- teach them. The central issue is the evaluation 
of courses, not the evaluation of persons.- A formal testing 
of the knowledge and skill of a sampling of past course 
participant^ can ^tell the course instructors and developers 
If ..the content and prpcedures taught in the course, are, 

indeed, being practice'^ and learned to higher levels of 

) 

Competence by participant's after returning to the job. If 

this outcome is. a ^oal for the course, it ought bo be 

measured. If the outcome is achieved as indicated by an 

improvement in the soores of participants on a delayed po&t 

testr this 'information can be very' aaef ul . It can provide 

some indication of about how much additional practice is ' * 

required for participants to become facile with the content 

and skills of a given course. This information can be i^ade 

pub]^ic to potential ftiture participants and employers. In ^ 

that event both may have a more reasonable expectation about 

wh&t to expect as *an immediate learning outcome for a 

particular short course or similar experience. The 

information also conununlcates what may be required in the 

way of an-the*-job leacning following the course experience 

to-.help the participant become competent to high levels, of 

expertise in a ^ive'n skill area. This' approach is especially 

important, in situations where there is a great amount of 

<^om|ilex material to be learned in a sho^t time. 
, -150- . - 



The "Hydrology and ^Sedimentology of Surface Mined Lands- 
course is a good example (Haan & Barfield, 1978). The 
material in this coorse is so complex that all that can be 
expected as a reasonable immediate outcome for theathree day 
short course is that participants will understand how to 
approach the design of drainage systems and storage -structures 
using the latest thinking, models, and techniques. Thorough 
familarity with the appropriate methods and procedures is the 
immediate goal, m addition the course textbook is really 
a technical manual which contains all of the information,* 
models, procMures.,' tables, computational algorithms, and 
rules of thumb typicaMy ne^ed to design iny drainage and , 
storage system for aity surface mining operation on any 
type of topography, soil type, and climatic c^mdition. 
Therefore, a major outcome intended for the short course, 
as in immediate objective, is great facility in knowing how 

to use the manual^his inoelves knowing how to locate 

■V * 

appropriate information- from charts and tables, knowing how \ 
to set up a problem given certain conditions and ranges of 
values in parameters such as rainfall patterns, soil t:ypes, 
slopes, local ground dover, and related matters. At the 
end of the course, all participants should be able' to 
demonstrate high levels of competence in the correct use of 
the manual for a rangia of sample problems. ^ 

These sampii problems aire the test tasks which are 
embedded throughout and the post test tasks found at the end 

-151- * 



of the course. Yet, ability to perform well on these tasks 
does not iasure cor>tinued growth and facility in actually 
designing good drainage, diversion, and storage systems for 
^surface run-off in surface mining operations. If ^he initi?il 
course objectives are achieved the participants are equipped 
to begin to improve their actual designs in this area through 
the use of many new conceptual and procedural tools presented 
in the three'' day. short course. Actual proper use of these 
tools will occur only if the participants return to their 
work settipg^and conti/nue, to design such systems and apply 
the tools and skills they havfe initially acquired. 

Work Samples as Altfernatives to Formal Testing 

Another a/ternative to the administration of a delayed 
post test to participants after \thfe coppletion of a short 
course is §valuatio.n of actual work samples. Enough time 
should have elapse4 to illow for the participants to 
actually have engaged in the repeated application o| the^ 

skills and knowledge learned in the course. Actual work 

\ 

samples from on the job performance are collected and 
evaluated. For example, a random sample of past course 
participants can be asked to submit a recent actual drainage 
anc^ storage design for a surface mining operation which 

they had prepared. These actual products of engineers' 

» "i 

performances can be evaluated and scored much the same way 
a laboratory activity of students is scored. ' Persons who 



. teach the course can determine to yhat extent the concepts 
and. skills in the course and the technical manual are being 
used prpperly and efficiently. Common errors or 
misunderstandings can be noted by instructors. This 
procedure constitutes the best possible delayed post test . 
assessment of the participants' ability to actually use the 
course content and skills in ways to improve practice. 
However, the procedure .is t^me consuming and difficult. It 
might take as long as a day to "gr^de" and_ evaluate the* 
design of any one engineer for a given complex problem. 
Certainly only a few such thorough evaluations could be^ 
carried out in terms of available time. The valu^of such 
activity would be primarily for a, stronger summative 
evaluation ^about the effectiveness of a course than could be 
obtained from an end of course post test alone. Again the 
interest in carrying out such delayed post testing^ls for 
thQ evaluation of courses, not persons. * 

Advantage's of Abbreviated Test Tasks for Delayed Assessment 

A more 'efficient alternative to the evaluation of actual 

X 

work samples of past participants' performances would be 
.administration of a good parallel test similar to the * 
embedded test tasks and post' tests described in the last 
chapter. That is, these test tasks or items would be ' 
abbreviated to require only a small aaount of the participant 
time. These tests should be constructed to the same 

• J, 

specifications as indicated earlier for pre-test, embedded 
- • " -153- • ■ ♦ 



171 



test, and post test items. Because an assemblaqe of such 
items could be completed fairly quickly, and because the 
items could be cast in such a manner that they could be 
scoired objectively by use of a multiple choice format or 
some similar ,shoj:t answer objective format, the course 
developers could very quickly evaluate participants' 
performances. What would result is some^ indication of how 
much retention and growth occurred in the use of courB^ 
concept? and skills after /eUirn to and use of the cours'e 
conte'nt on the job, ^ . • 

- It is important to nc^te that one would want to insure 
that those persons sampled for :ar(y tvpe of deiiyed post 
testing procedure ate actually engaged in work activity which 
calls for or requir.es the application of the course concepts 
and skills wh^.ch were instructed in the short course ♦ If 
one sampled many persons who .worked in areas which did not 
involve the use df the course content in th§ir work, one 
would expect no addi^onal growth in skill or knowledge level 
Therefore, any delayed post testing*, -or other type of 
assessment of the stability and growth of course concepts 
and skills, shguld always, be accompanied by procedures for 
obtaini||L some ot^ier types of information about the 
individB|L's use of the course, content in his or her work. 



-154- 



- Questionnaires, Surveys, and Interviews as Delay ed Assessment 
Procfedures • ^ ; * ^ 

^ Questionnaires , asking about the frequencfy pf use of 
course content directed to the past participants- or their 
supervisors are very appropriate. So are other questions * 
asking ^bout the critical nature of the course content and 

^ skills* Examples of such questions include, "Hov; frequently 
do you use the content, skills, and procedures learned in ^ 
the Hydrology and Sedimentology course in your work?" or !in 
tbe -last thr^e months, how man^ times have yt)u been faced 
with a problem where^ som^^^ knowledge, , skill, or procedure 

encountered in the Hydrplogyv^nd Sedimentology course was 

. X - ^ ' 
es^eantial to the solution of thd% problem? Appendix A 

• . " • ^ . ' \ ^ 

contains* many examples of these types of questions which can 

l^e used to collect information from past participants aha" 

their employers co^ncerning the degree to which the course 

conteat is meaningful and centrally involved in ongoing work 

performance* ^ Past course participants should be asked 

questions about. the degree of critical usage of the course 

content and skills, the frequency of use, the degree to ' 

which participants. have recoimtended other colleagues • 

attendance at replications of the particular short course, and 

, how useful the course was to their ongoing work actici^y. 

The, resppnses to these types ^ of questions are very- informative. 



Information of thig type should be routinely gathered and 
used alpng with' test 41ata as part pf the delayed assessment 
• • . ' -155- ' ^ 



ERIC 



1?S 



of the effectiveness of short coursesv and other formats of 
continninq engineering education courses. 

Data on these tvpes of dimensions can be collected from 
' systematic interviews .^conducted with samples of past 

participant^, or employers by telephone. "Written 

■ . . 1 ■ 

questionnaires and surveys^may also-be used. In either case, 

engineers and their employers are usually willing to 

participate in such activities if it is cl^ar to them that 

the persons conducting the survey are see4ing information 

•about the value of particular short courses. to better meet 

the needs of practicing engineers. For the same eea?on, 

these groups are willing to participate in the completion of 

formal post tests on course content and skills and to submit 

actual products of on-the-job work performance for "evaluation 

. '' '■ ■ ' ' ' ' \ 

by course instructors. ' • [ 

. '. • ■ 

5°!^ of Del ayed Assessment Activity in "Needs" Assess ment 

Systematic gathering of such information makes the- tacit 
evaluation of a given ;^'hort course very clear to the 
.d-eveiopers of the qotjrse and to those who directa^nd operate 
continuing education programs. In addition, routinely seekin< 
such information ^om samples of past continuing education 
course participa;its and their employers conveys tio both 
groups a since^ interest in the needs of practicing ' 
engineers on /the /part of continuing engineeii^nq educatdtrs'. 
Needs asse^ment is an often used and abused term, in the 

J -156- . ' ' 



177 



jargon of continuing education. It is 'often implied that all. 
that is necessary ts to "go out into the field" and find out 
. what it is that prospective clients for continuing educa^.^on 
, courses "need" or. "want". There is no- better way to' be 
. involved in needs assessment activities, other ttian continual 
interjaction with past participants and their employers 
concerning the outcomes, intended and unintended, of courses;- 
•already in operation'. If courses are developed and ofierated 
which serve well the fvmctional r\peds of professional 
engineers, and if follow-up activitie^s on the. part of* the,, 
persons who pperate continuing education programs convey a - 
sincere, interest in making these courses even more effecTtive, 
new opportunities to develop additional courses and 
continuing education experiences • will* arise. .The value of 
these follow-up interactions ^wit^i participants and their • 
.employers is -high for both the summative evaluation pf the 
degree to which given courses have achieved intended long i 
term goils .ag vjell as for maintaining an open and easy 
communication betwben the sta"ff of , continuing education 
programs and the clients they serve. 



Conclusion 



^Oftentimes persons in professional licensing organizations 

and academic circles tend to attribute more crediibility to- 
j • - - ■ . ■ . • . 

the results of formal t e st s c ores given at the end of- courseg- 

as measures of learning outcomes than to information g:ained 



I ' -157. 



178 



from the tvpes of folloy-uo and delayed assessment procedures 
which hive been described in this chapter. In reality, both 
types of information are needed if one is to make reasonable 
inferences .about the degrees of learning which results from 
a particular short course or similar continuina education 
experience. / 




-158- 

r 



Chapter lO* . . 

• ^ • 

DEVELOPING VALID AND RELIABLE ^TESTS FOR . 

ASSESSING LEARNIN'G OUTCOMES . . ' 

Earlier- chapters have set forth many procedures /or 
making valid inferences about learning outcomes^ resulting 
from continuing education courses. There are other way's of 
■measuring learning outcomes resulting from continuing. ' 
engineering 'education courses besides formal testing. Som^ 
of these alternatives have been described -previously. Yet, 
formal testing is one fay by which valid and reliable 
assessments of learning can be made in an efficient manner. 
\^ Developing tests which perform in this manner is a demanding 
task requiring much time and expertise. However, if a 
particular short course or other type of continuing education 
instructional activity is bo be replicated , many times, it j.s 
worthwhile to develop pre-teats., e»bedded tests, post tests, 
and- delayed post tests. The . utility of these tests is 
. primarily fox assisting in the business of instruction by 
determining What it is participants have or have not' learned 
and which methods and procedures are most- effective in 
achieving desired learning outcom^. " \ 

The remainder of this chapter presents a set of 
procedures by which to construct good tests useful for makinci 
inferences about levels of 'skill and knowledge in pacific 
-performance areas. The procedures apply to the development 



-159- 



150 > 



of pre-tests, embedded tests, post tests and delayed post ^ 

tests. If the procedures are properly . carried out, a , large 

pool of good test it^ms can be developed. These test items 

can be assembled into parallel foras of the dame„t.est. The 

parallel- forms should be'the same in terms of the performance 

capabilities that they ,test for. They should be different in 

that the individual itemp and tasks of which they are' 

' comprised, although drawn from the same perf'ormance domains, 

represent different problem situations. If test items are 

VWerly developed. It is possible to use the parallel forms 

of a test interchangably for certain pre-test, post test, and 

dtelaye'd post test functions; This means that the effort 

expended in developing aogdod test item pool for a frequently 

taught course -is a good investment although the initial 

development of the item pool is coatly. 

Subsequent sections of this chapter deal with each of 

the first four main steps in a procedure for developing • 

valid and reliabl;e test«; Chapter 11 deals with the fifth 

step. All five mailt steps and the various sub^teps are" 

presented in Table 3. '.The procedure listed is'^sed on many 

jears of test development activities iay' many -persons . -There 

is a rich theoretical and empirical literature which supports 

the procedure outlined. The presentation in these two 

chapters is' simplified and basic. Persons wishing more 

detailed and technical presentations m^y wish to eefer to. 

V • * ... 
other sources such as baqk issue^of th^ Jouinal of * 

-160-' 

» 



Table 3 



> Steps for Developing Pre-Tests, Embedded Tests, 
Post Tests, and Delayed Post Tests, by 
Which to Estimate Learning .Outcomes 



Stat ing Course Objectives in Performance Terms ' ^ 

Determine desired course objectives. 

List the specific desired learning outcomes in 
terms pf specific performance capabilities. 
' b} State operational criteria by which adequacv of 
the performance is to be judged. 

=11- ^lap^inJ^^^ Items to ^e Full Range of Performan<j| ' ' ^ 
Objectives \ * - . 

Idaatify, collect, design, and sample from realistic 
problem areas test tasks by which to measure the specific 
perfomance capabilities listed in step one, across all 
topics and skill areas.' ^i r^ ' 

a) Construct a performance by test item matrix to . 
insure proper and complete coverage df performance 
objectives by test item tasks. 

. b) Examine the test tasks which have been selected .to 
>'ftsure they are parallel to instructional tasks by * 
^which the performance, capabilities are to be 
instructed. 

.c) Examine the test tasks which have been assembled to 
. 'insure th^y have; been properly- abbrevi?ited from 
. . more complex -real life problems.- Test 'item^ must 
test for critical knowledge artd- skills in the * 
^ perfbrmance of^^real \^0tT.d" tasJcs. Yet, they must 
be able to be fcompletad. in a short time, -usually a 
, . few ^ minutes at most. 

IXI.>- - ExteifnallV' Validating Test Itmms -and fe^ts * 

Validate the test*tasWj initially assembled, into a test 
, to insure the- test measures/ what is being taught in 
• the course'; ' ^ * * * 

. a) Identify other persons- expert 'ift the content of the 
•course. \ ' . ^ \ , ^' • * 

b) -Have eacH expert, excunine the cqur^e'^'&ontent , 
oljjective.s, and instructional tasks against the array 
of t6s^ items- to locate any area,s of omis«ion or 
non-coitipatibility between instructional tasks and 

. goals ^^nd tedt .tasks by which to measure learning 
toward these goal 5. 



». 



IV. 



• Tablets CQontinued) 

c) identify another small sample of persons expert \ 
in the content and performance capabilities of \ 
the course. \ 

'^^ ^^;^;^^^^fK^^® ^"^4^^ assemblage of 'test items-or \. 
tasks to this second group, of experts. Score \ 
their performance on each item. Note problem 
t^e experts answer incorrectly or 
have difficulty. " . ' ^ . 

e) Interview each expert and secure his or her 
suggestions for- the improvement of individual 

* the, overall bollection of'itema.on 

4:\ Ia -^^^- ^f^ise items And the test accordingly. * 

f ) . Administer the test items to a nai*e group of 
- w^^I!^- an engineering or scientific 

background But no particular .expertise in the area 
Of the performance outcomes taught in the course 
^ under consideration. Score the performance of 

each naive person. Note items which are answered 

correctly by this naive group, insure that most 

of the items on the test are not o'f this basic 

6r prerequisite typS. Rewrite or prepare '.new - . 

Items that test ^or knowledge and skill taught in 

the course, rather than some more general knowledge 

or performance capability. 

Assembling Itiuns Into Different Types of Tests 

, Examine each test item which has been developed and- sort ' 
tL i^J^'Vr*''^!! ^^tegories, from the staddpoint of 
^«n^!n « ^^^^^ required for sacc.sfifvil 

completion of the item. These categories include: 
1) Stans which test for only basic knowledge and 

skill assummed upon entry into the course, required 
. for success in course learaing activities, and ' 
therefore, not ins,tructed in the course. 
Items which. In addition to 1, test for knowledge 
and skill outcomes which are the ppecific 
perforraaribe objectives for the course and which ' - - ' 
all persons should be able to respond to' correctly 
after completion of «ie course. 
3) More difficult items which, in addition to 1 and 
2, test for knowledge and skill that may be 
expected to improve with^ pi:actice. after the short 
cS^urse is completed because of application of \ ' ' 

course principles and knowledge on the job. 



\ 

1 1 



1 

-162- • . 



L 185 



Table 3 " (continued) ' 

a) Sort all items into one o€ the three categories. 

b) Determine the number, variety, and function of 
all tests needed for assessment procedures 

^ including pre-tests , embedded tests, post tests, 
and dklaved post tests. 

c) -If desired, assemble items from category one into 
a pre-test. This test would serve only as a 
scre.eiing and advising device.' it would be 
administered prior to' a participant entering a 
course and Its results used to make ju4gment about 
sufficiency of preparation for course activities. 

d) Assemble a comprehensive pre-test, a comprehensive 
post test, and a* comprehensive delayed pest test 
by using ijferfems from all theee categories. -For" 
each tea^ most of the items should be from 'category 
two, but-tftere should be some items from category 
one, and 3 a-s well to insure a proper range of 
easy and difficult items. I^sure that each of 
the three tests, wi'th respect to each other, has 
aniequal number of parallel items from categories 
1, 2, and 3. 

A) Admrnister and use the three assembled 'com- 
prehensive test forms interchangeably as pre-tests, 
•post tests, and delayed pdftt tests for different 
gaoups of participants on replications of a . 
tourse,.or use each test form for one-thied of 
the participants in any given large enroll-ment • 
course — the*pre-, post, 6r delayed posf^ 
test role. 

f) Assemble any embedde4 lests useful for ongoing 
as^ssment of learning .during Instruction by 
driving items from categories 1 Aftd, 2. Note that ' 
. in later stages of learning, ite^ which were 

initially in category 2 move to category one. Be 
^ sure to use Items parallel to but nd't identifcal 
to the items sampled fee the -three forms of the 
. comprehensive tests to avoid "teaching* for the 
test." - ■ 

■ * / ... t 

Conducting Item Analysis arid Test' Reliability Studies 

Using the dat^ collected from test administrations to 
actual groups of course. participants: 

a) Determine item difffculty to each item'. 

b) Determine the ability of each item to discriminate 
, between persons, with a good understanding of the 

isubject and those with a poor understanding of the ' 
course content. 

^-163- * > . 



1S4 



Table 3 (continued) 



c) Determine the reliability of the .tests which have 
been assembled by various procedures, modified to 
be appropriate to criterion referenced testing 
and mastery learning approaches if that is the 
intent of instruction. 

d) If needed, modify and rewrite, individual test items 
to produce appropriate levels of difficulty and 
discrimin^ition across items. 

e) Coicpare the various forms of tests which have been 
developed, sudh as the comprehensive pre-test, the 
comprehensive pos,t test,, and the comprehensive- 
delayed post test. If the tests are parallel but 

• consist of different items, each may be' considered 
an independent measnre of the same performance 
capabilities to the degree that all three test 
foirms are highly correlateid with one Another in 
any one role. That is, if all three forms are used 
with three different groups as a comprehensive pre- 

^ test, the same results should be obtained by all 
three fgrms in this pre-tes€ role, ^here should 
be no significant, differente in pre-test- scores of 
Equivalent randomly selected groups of beginning 
course enrollees on any of the three test forms. 
Likewise, any- of the three test forms which have 
been deveiojied »hould achieve the 'same results ' - 
with course enrollees when alternate forms are 
used in any one role (post test or delayed poat 
test) over replications of course offerings. 

f ) Plot individual studefit and group mean acoree fior 

', pre,"teat, post test, and delayed poat tests against 

^ rank order of participants on- some axternal 
Qriterion of performance, or against rank order of 
the post test, across replications of the course 
with different groupsocJf participants. If the ' 
course and the tissts are designed properly, t>ost 
tests and delayed post test scores should be higher 
than pre-test scores on comprehensive pre-tests." 
If this pattern does not occur there ia a serious 
problem with the test or with the instrtfction or 

'both. How high the, performance scores should be 
to demonstrate mastery ofi the test forms is a 
matter of logical determination. Mastery may b*e 

. set to be equivalent to- the average score obtained 
by experts in the course content in step III above; 
Mastery may also. be set at an arbitrary level such 
as an .average of 85 per tent correct completion 

.of all test tasks-. ... 



-164'^?' 




Educational Measuremefnt , the lending journal in this area. 
Many good books are available such as Thorndike's (1971), 
Measurement and Evaluation in Psychology and Education ; 
Maratuza's (1977), Applying NOrm-Ref erended and Criterion 
Reference d Measurement in Eduction ; Nunnally's (1972), 
Educational Measure ment and Evaluation; Marshall and Hales' 
(1972), Cl^SToom Test Consttuction ; or Tyler and Wolf's 
^^V^^l ' Crucial Issues in Testing to name only a few. . The 



\ 



\ 



America* Educational Research Association Monograph Series on 
Curriculum Evaluation, volumes 1 through 5 published by I^nd- 
Mc'Nally, is aixother source of. excellent articles on this 
topic. (AERA Monograph Series, 1587-1970).. 

The procedures Vestfnted are ideal. In actual practice, 
one cannot always employ allwof the steps listed. However, 
attention to the procedures will help insure better tests 
by whicW-to measure learning outcomes of t»y course., 

Starting Course Objectives 'in Per45^rmance Terms 

, The first task in the development of a learning 
assessment 'test for any course', as well" as the development 
of the course itself, is to specify the specific performance 
capabilities which are to result from instruction. The 
particular p^rformancjs capabilities of the student following 
instruction should be listed in "action verbs," (Gagne,. 1965; ' 
Gagne, 1967.; Gagqe & Briggs, 1974; Manning,. ,0270; /Webb, 1970). 
Action ver bp clearly poi^t out what it i§vUie peiraon will be 
able to do afteriinstruction. They provide 'a convenient way 



ERIC 



-165- 



iS6 



to capture' the essential features' of a performance 
operational terms. Any one action verb is not sufficient 
to describe a complex performance' in operational terms. ^ 
However, a carefu-lly -selected array, of action'verbs can often 
•provide a good quide to the selection of instruc^onal 
activities by which to leach the essential elements of -the . 
performance and also 'the selection of. test tasks by which 
to assess the dfgree to 'which the performance has been learned 
Examples of action verba include calculate , design , ' 
construct, compare , recall, select , organize , recognize , and 
s'o forth. How one uses these on. similar action verbs to 
describe the specific performance expeoted to result from the 
instructional activity is illustrated in Table 4. The 
action verbs listed are those for the first unit in the 
"Hydrology and Sedimentology " course (Hahn & Barfield, 1978) 
described" in Earlier chapters. The objectives listed Table 
4- «!:• tlik bapis for th« sample te»t itama in Tabl© 5. 

Eaph- objective in Table 4 includes one or more actio^ 
verb in a statement of what the learner should be able toW 
given certain problem conditions and available resources in the 
forms of formulas, computational algorithms', charts i and 
tables.. CoyLectively, the objectives define the" expected""^ 
performance outcome capabilities which should be achieved' by 
students, ab the end of the unit of insttuctioti. Furthermore, 
these descriptions are* stated in very operational terms. The 
test items in the- sample test in Table 5 and Appendix B are ' 



3 mane 



Table 4 



PerfotHTance Objectives for Open Channel 
Hydraulic Structures Unit: An 
Illustration of Test Construction Procedures* 



Objective 
Number 



• Action 
' Verb(3) 

Describe 



\ 



Recall, 
Recognize ^ 



Describe 

Adjust 

Calculate 



Calculate 



Calculate 



,Descrip,tion of the Performance 
Required and the Conditions 
Under IVhich it is to Occur 

What happens to the value of 
Manning's n when the boundary of a 
channel varies through a range of 
structural conditions including 
different types of vegetation, '^non 
vegetated soil aggregates, and , 
man-mcide lining ma t e r - la - l rS-,- 

The typical profile^ of flow 
te^locities (fps) for hydrologic 
channels of various cross section 
shapes at typical slopes. 

The relationship between retardancfe 
and flow.^rate in ah hydrologic 
channel and make adjustments in 
design specifications (depth^ top 
width, hydraulic radius, slope,' 
and cross sectipnX to produce 
desired freeboard -and channeT" 
performance given changes in 
retardance or flow rates. 

By, the limiting, velocity method the 
permissible flow rate for channels 
given, various slopes, refcfuired 
capacities, boundary conditions, 
soil'tupes, and cliannel cross 
sections. 

By appifopriate methods and proper ' 
use of tables and charts provide4, 
the value of Manning's n for any 
type of channel given the boundary 
characteristics . 



♦See Appendix B for details about how the* performance 
descriptions were developed and how test' items were' designed 
to measure each objective. 



-167- 



Table 4 (continued) 



Action 
Hethis) 

Calculate 



Calculate 



Design, 

Diagram, 

Label 



Recognize 



Use 

Select 
Doublecheck 



Description of the Performance 
Required and the Conditions 
Under Which it is to Occur 

The hydraulic radius of channels of 
differing fcoss sectio^js according 
to the appropriate modification of 
the basic computational .algorithms.. 

The design specifications for any 
given- channel' including the values 
Vp, R, S, D, T, and necessary free- 
board given the specifications far 
-any two of these values and 
information about moil type, . 
topography, etc. 



AJaydroloqic ^channel deal£|ngd to . 
plrfbrm to^ stated sjiecifScations 
under stated problem conditions, 
similar to those listed in item 
g al^ve, . * - ^ 

/ 

The reasonableness of design ./ 
specifications obtained .as the ^ 
solution- to a particular design 
problem involvihg a hydrol^gip ^ 
cha^inel given the problem vaifiables. 

Appropriately, computational short 
cut procedures, computttionkl 
algorithms, and graphic solutions to 
complex equations given a variety 
of problems involving the design of 
hy^rologic channels under widely 
differing conditions of rainfall, 
soil type, slope, etc. 



designed specifically to measure the degree to wkich these 
,learning;^utcpmes have been achieved.- Study of the sample 
test items in Appendix B, the accompanying lextual material 
in th^ Appendix, and Tables 4 and 5 in this chapter provides 
on6-tl lustration about how the essential features of a 
complex performance may. be operational ized and translated into 
specific learning asses«nent t^Jsks;^ in this case/ particular 
test items designed to measure each* action ver^i' peef ormance ' 
statement. -1 • ' . , 

♦ 

Mapping Te st tteins W^e-FiJir-Range of PerfoTm^nce^ - " " — 7 
Objectives ' ' ' . . 

It is very important to the development, of a good test 
to be sure that .each of the aany performance outcomes 
expected to resalt from instruction is- tested dibr by several 
different items." This require* an array pf items which have 
been fitted to or "aapped out?'' -to, the full range of specific 
performance objectives. This "process of mapping out iteilis 
to cover all the main perfor^iance objectives at their 
dif ferincf levels of difficulty can be aided by a number of 
systematic approaches. Perhaps an example will be inf orinafive . 

The sample test in Appendix B has eleven items. For 
convenience the 11 (est items are presented ia Table 5. The 
^^.^^ with its attached reference materials and detailed 

narrative explaining how the test items were developed is • , . 
found in Appendix^ B; The purpose of thtf material in the 
Apiiendix is ."to assist the reader in generalizing the 

-169- ' ' , 

• ' ' 'IDO / 



V 



Table 5 ° ' 

.TEST FOR "OPEN CHANNEL HYDKAIILICS" UNIT - illustratinq 
-the Mapping of Items to Performance Objectives - 



i; What is "a typical profile of flow velocities (fps) for 
,the channel cross .section represented in this fiijure? 




A. 


a = 


4.9, 


b - 


6.5, 


TV 
C 


= 1.2, 


d = 


2.6 




B. 


a = 


1.2, 


b ^ 


2.6, 


C 


= .4.0, 


d - 


6.5 




C. 


a = 


6.5, 




4.9, 


C 


= 2. '6, 


d = 


1.2 




C. 


a = 


6.5, 


b = 


6.2-, 


C 


= 2.6, 


'd = 


2.3 





2. What happend to the value of Manning's n when an 
erodible parabolic cross section open channel iS 
vegetated compared to an identical nonvegetated channel? 

' -■ ' 

A. increases 

B. decreases! 

C remaias unchanged 

D. varies with runoff vAlume 

3. - A nonvegeta'ted trapezoidal channel through sandy loam 

collidal soi^l has originally^ Taeen designed to carry 8 
. • c^s of water .down a 4% slope, 'suppose the engineer later 
decides to use a vegetated channel. What must he do to 
insure an equivalent capacity with the vegetated 'channel 
given the same slope, soil conditions, and channdk shape? 

A. Select a grass which will grow to a unitorm . ^ 
height without cluihping to assure uniform "flow 
rates at the channel, perimeter. 

Design a somewhat deeper and wider chafnnel to < ' 
allow for the increased retardance of tlie ^flow 
caused by the vegetation. . * 

Design a somewhat shallower and narroweit channel - 
because with vegetation a higher flow rate cap 
be, sustained. . ^ 

Maintain the original specifications for the n'on- 
vegetated channel because the flow capacity will 
remafn nearly unchanged. * ' / , 



B. 



C. 



D. 



-170-- 



1: 191 



i-able 5 (continued) " ' ""^ 

shale and'hardSn Th^Sh T^^ channel material' is 
with a 3:1 side siope Ss^Jh^ J° trapezoidal 
questions 4-8: ^^^^ information to answer 

p:^L^^b^^^;;e1o^lt^(?p^'rfS?^rt;r^^f - 

nonvegetated channel? • ^^"^ flowing in this 



A. 
B. 
C. 
D. 



6.0 
3.5 
4.2.7 
4.0 



5. 



Sngl °' Manning's n for this nonvegetated: 



A. 
B. 
C. 
D. 



.037 
.020 
.030 
.025 



Using Mannings equation, Vp = -liii 



2/3^ 



1/2 



the 



^''sfJiiS JJ"^"? °' "'"'""^l -calculated to be 



A. 



B; 



.48 ft.^ What^ Should he do next? 



C. 



D." 



!j/1.3 or 1 

depth value and the bottom ■ 
wi4th value to provide adequate freeboard in " 
case of a heavy' rainstonrf. in 

an^^r.^^n^*^i5.^^''* approximation, for depth' 
^elaSonllJi^r Reasonable by using l^l 

> bd +_2drl^ I 



r 



ufl}na^?^!'^^f^J?P '^if^V^^ the Channel by. ' 
using the relationship,^t^« b '+ 2dz > 

Calcul^ate 'the, wetted perlra*ter-vaWei>cft: -thA 
channel «sing the relationship id ^ 
determine flow resisfance. ° ' ^ ^ } to 



•JebS««y'?or\i; solu^!^' T""^^^ information in their stems 
in t"? group ^? i?^s!^°" °' contained -in later U^s^ 



ERIC 



1 



-8 



Table 5 (continued) ' ^ , ' " 

What can hfe said about the engineer '.s estimates of the 
values for the depth and bottont width of the_ channel? 



A. 

B. 



G. 



D.' 



Both values are a reasonable approximation of 
the true values. 

Neither value is a reasonable approximation of 
the true vAlue. 

The width estimation hased on assuming- a . ' 

rectangular c?coss section is only slightly in terror 
The depth -approximation is based upon assuming 
;that R = d and is quite accurate for this channel . ' 

- i ' • 



What are the final values which' are necessary for the 
depth, (D) bottom width (b) , and top width (T) of the" 
channel if it is to operate at the capacity ^iven in 
the first part of this problem and under the soil an*d 
slope conditions specified? Include the necessary ' 
freeboard (f t. ) . . • . 



A. D = 1.3, b 

B. D =« 1.6, b 
a. D = '2.0', b 
D.. D,.*» 2.4, b 



1.5, T = 

1.8 , T = 

7.0, T = 

7.0, X ■■ 



9.26 
11.1 
L5.0 
18.0 



9. 



, A- parabolic chani>jsl is to be- designed to carry 25 qf s 
of water pn a 4% slope. Because the soil is easily 
eroded, ,th^ -designer decides to vegetate the channel 
with fescuftwhidh is to be unmowed. Use this ' ■ • 

inform'ation^o answer questions. 9 - IJ . 



What .ij 
flowing 

A. 
B. 



the maximum permissible velocity for water 
theough this channel (fps)? 



3 
5 



Lass for this vegetated channel? 



10. What is the retardai 

A. A 

B. B ■ * 

0. c ^ • • 

D. D 



11.' What is .the hydraulic radius of this channel? 

^ • 

*'A. 1.1 • 

' B. .58 " ' ' 

C. .82 

D. 1.6 * . 

- -172- • • 



ERIC 



principles demonstrated in this illustration- to the design 
of other tests.. The test items in Table 5 were written' to 
test for the presence of the performance .objectives listed in 
Tkble 4. Examination of ' the test items in the sampife test. and 
the" performance objectives for which they .were generated * 
reveals several impbrtant points. » 

* % * \ ' ^ I! 

First, it is apparent that the objectives and the test . 
items written to measu;re their achievement vary in dif-ficulty 
and level of complexity. The first items and objectives are 
simpler, the latter more difficult and complex. The first - 
objectives and items mainly call for description, recall, 'and* 
recognition of certain relatiionshlps , factd, and' principles J" • 
Later objectives and items call for application of knowledge 
of fapts, principles, and relationships in jthe" actual solving 
•of eome complex problem. ' / 

■ A -good test should always consist of an array of' items 

across levels' bf performandie from- complex to difficult. Such 

\ . ■ ■ 

a test reveals much about' what' parts of the final performance 

have been learned well or not SO well.. Learning cortplex 

pftrfoisBuince tasks is a gradual process. It takes much time 

and practice. It can be expected that early 'assessments with 

test tasks similar to those displayed. in the sample test in 

Table 5 will show stronger performance in knowledge of facts 

and -recall of particular formulas than xn other. ar6as dealing 

with the' skillful and rapid integration "of all the subskills 

and- knowledge into a cfompl'ex problem solving activity. Thus, 



as student performance examined across p::e-tests, embeddetd 
tests, and po,st tests', there should be much growth," 
'particularly in the 'items dealing with Skillful ^application 
of course content and skills. If such- growth does not appear ^ 
there is someth.ing wrong with the instruction or with the- ' 
test. ... 

■ One common mistake made with 4:est' items, especially of ' 
the multiple choice. variety ^ is to include 'brily low'l.wel, 
recall of , factual information types of .items. ^ A, test composed 
.of only th^s^ t^^pes of ^i^ems is .very I'CmXted as an asses*sment' 
•••bf learniKg;;*Qi*c.oitiea -of a more cqmplix. nature, ikferences 

• • • ■ . ■ .. • I ■ ' ., " 

aBqi^t. the- overran -degree of learning .re^'tinV ffom 

J,. ■ ' . • ' • ' . • ' • ^ • 

. instnwct;ion, based .on tests are valid, only' so long as ^the . test"" 

items accurately represent the fuO^l range of ol^ebtives from 

simple to complex. Eaqh'performarice outcome intended' dd a 

•■■''•* . 7' " f 

goal of instruction, must be represented in 'the test items 
. - . ■ • ■ . > ^ % 

presented- in order to allow the student to demonstrate what 

\hfe .or she has learned. * • - ' * • ' 

Inspection of the sample performance objectives, listed 
in Table 4 and^ the^. sampli test." (^able 5) for those performance 

objective?, reveals, another interesting characteristic. For ' 

..f 

the particular- objectives written for the unit., the tast items 

presented are only one possible sampi* of 'the many iteJns ' 

which could be written to test for students' skill in the 

achievement of the inten*6d performance outcomes. For any 

giV'en pet forma nee objective which is" stated in operational 

-174- 



-terms, it is possibl'e ta design many different test items • 
which measure, to some degree, the underlying performance of 
Interest.^ This is why it is relatively" easy to develop many 
similar items for use on parallel test forms -for use as pre- 
tests, embedded tests, post te^ts, or delaye.d post tests. 
This property of 'multitple test items for each_ perfWan9e • 
•bbjefetive become^ more pronounced .as the performance objectives 
become more broad and inclusive.-^ Thus', it is^ven easier to- 

write many parallel form', test items for complex ^^^'ijcl higii - 

\ ' V ' .'^ ^ : ' ' 

level performanqe objectives t^an for simplV performaiibfe • " - 

, ■ ■ '■ ■ - • ■ ' 

objectives which require only 'the memorization and recall 

of specific information-. ' , " 
- • , - ■ • 

Inspectioi^f the sample objectives and the jb«st items 

developed for these objectives also reveals another . impor tanf 

= relationship. Any given test,. item may t^st.-for more than 

one specific performance' objective. .Complex and demanding 

test items usually "test not only for^ the ability to integrate 

and apply much knowledge,' skill , and judgment into the correct 

approach and solution of a problem. ' In addition,, they test 

for knowledge and recall of appropriate facts, relationships,- ' 

procedures ,. and skills in bringing all of these components , 

together appropriately into a competent and sTcilifcd 

performance. Ite»s 4 » through 8 and 9 through 11 in the test 

. ^ i" ' ' ■• • 

in Table 5 illustrate this property. Each of these item? 

tests for knowledge at multiple levels, It is desieable to/" 
develop and use such multiple performance asse^saent test 

^ -175- , •• ■ • • 



items.. However, for every test item it should be clear what 
particular performance outcomes are being itfea'sured. ' Care must 
also be taken ±o injure that the perfoirmance demanded by the 
item can be' compreted in a short andT reSisbnable length of 
.time for one test item. An earlier chapter has provided 
guidelines and examples of how to abbreviate such complex 
performadae assessment tasks without losing the essence of « 
the* main features required for a skillful performance. The 
additional^ explanatory material in Appendix B for the test in 
Table 5 also provides information about how'ta abbreviate 
complex performance tasks to make them into test items. or 

Taxonojnic and Task Analysis Approaches to Mapping Items and 
Objectives 

. , ' , )' 

, There are many procedures for monitoring the appropriate 

ft ^ 

level and ''sequencirtg of a hierarchy of performance objectives 
and t^st tasks in th-e design of instructional activities ^nd 
the development of a range of test items by wiich to assess 
achievement of the objectives. . One good ;general guide to 
striking an appropriAtet balance of objectives and items, ft'om 
a simple and fact oriented to .a complex and skill oriented 
focus-, is the Taxonomy of >Eaucatdbonal Objectives: Handbook - 
I, Cognitive Domain (Bloom, 1956) . This manuil, provides 
many' specif ic suggestions and illustrations of how to 
jiss^ble reasonable samples of both ^performance objective^ 
and test tasks and items by wlkibh to assess performance 



across variou.s levels. Another spurce^vn;i( h provides aetaile. 
procedures 'and examples of how to. construct arrays of . V 
performance objectives- and • test items -is th^ -Andbook on 



-!:o?jnatAyje^^nd_^™ 

(Bloom et al., 1971).- • • • ' • *. , • . ' 

Anoth^ approaish to the problem' may be found in task ' 
analysis interpretations of cromplex performances. Under.'. ' 
this, approach, the final performance capability intended as 
the -Sutcome o-f, instruction ' is idescribed in ill o,f its 
complexity. Then one "asks oneself and others expert in the'' 
performance domain /' "What knowledge and sj^iU'are prerequisite 
to- being able to produce the final skilled' per formance? " in 
this manner, a series of knpwledges and skills are ' 
conceptualized in a de^scending hierarchy from very highly 
skilled and complex performance to very basic levels of know- 
^dge and sjc£,ll. After the conceptualization "of the complex 
performance is "task analyzed" the various sub-skills and ^ ' 
knowledges become the focus of particular learning .objectives * 
Which are now arranged in an ascending hierarchy. The 
•intended performance objectives at each level ^ the 
.hiei^archy are used to define the content, topics, learning 
act^iyities, and instructional mat«|rials which will be used ' 
to instruct participants up the hil^rarchy of knowledge and 
skill. The same conceptual • hierarchy is used to develop a 
series of test Or assessment 'tasks which parallel th^ 
instructional ^tasks. The task analysis approach-to the ' - 
design off. instructional objectives, instructional tasks 



• and , activities by which to -teach , the performance ob'jectives,* 
and th^ test items by which to'assess learning resulting 
from dns'tru'ction,. d«B particularly well suited to technical ' 
land s'cifentific fields,. This is because in these fields -there 
.is a natural cumulation of ^ information, concepts, skills, 
and complex- procedures into integrated and very high level 
performances (G^he, 1962; Gagne, 1967;'Gagne & Paradise, 
1961; Salvendy & Seymour, 1-91 i) , The sample performance 
bbjectives in 'Table 4 and' the sain|>le test items in Table 5 
were developed in -a task analysis mariner and provide a . 
concrete illustration of the procedure. ' ' 

Persons interested in the task analysis al)prolch to 
.the design of . y^structional objectives, instructional 
activities,, and. ^ssesament tasks will find , The eonditions of 
Learning (Gagne," 1977) and Principles of Instructional Design 
(G^gne & Bjriggs, 1974) to be of value. Both of- these "texts 
provide many details for using task analysis procedures. In 
addition, many studies which .are excellent ' illustrations of 
how to use the procedure in the design of technical courses ^ 
are referenced. Many of the examples are from the tecHnical 
,and scientific fields. Another good source is a dissertation 
titled, An Evaluation Modgl for Developmental Growth (Marion, 
1978). Marion's research' is particularly, interesting since 
it deals 'explicitly with the measurement of complex 
performance! capabilities resulting from technical training 
programs. In additio^, Marion's procedur'es are ^explicitly 



3m 



conceptualized with a' learning hierarchy •framework and 
designed to integrate diverse typ6s of information fr^r 
areas .sucH as formal' test scores, actual ' student performance- 

• in laboratory or cAnical settings, and instructor or 
supervisor ratings into a common assessment of the degree 
of student learning. 

Either the taxonomic approach oC Bloom or the task 
analysis approach of Gagne, Marion and others, ^an produce 
a set of performance objectives and assessment tasks which ' 
are- very operational. These approaches can also be nicely 

•integrated with one another. Either approach also tends 

t • 

, to produce a -proper array of test items representative of the 
full knowledge and . skifl domain . required^ for adequate ^ 
perfoxnnance of complex tasks*, Thi6 is especially «o if * 
attempfcfi are made to groups performance objectives, 
instructional abtivities and 'tasks, and test ^taska in real , 
world prc^blem solving tasks typically encountered in practice 
by the engineer (Salvendy & Sfiymour, 1973),' 

' \ ■ 

Constructini^ a Matrix of Objectives by Tefct. Items by Topics 

* . No mattjer what approach is used to generate test items, - 
it is a godd'^idea to construct a mattrix of objectives by 
items for each topic included in' a course. This insures the 
development, of '\better criterion by which to measure learning* 
outcomes (Mannikg., 1970) . The particular performance ' 
objectives ma.y bl^ listed in a column, similar to the 
.presentation of t|ie sa^nple objectives for the unit in the 

-179- 



Hydrology and Sedimentoloqv course in. Table 4. Te^t items " 

which have be^n developed can be sorted into rows ' - ^ 

corresportdinq to pach ^ev*l of objective represented in the 

cplumn. This prociedufe quickly reveals anv imbalance of ^ 
j '! _ . • - ' - . - - 

items in the total .^irray "to be used for the test. If 'too 

■many item? are written at the* infi€rij\ation level and tdJO few 

at th^ application of principles le^l it becomes immediately 

apparent. Gaps often will be identified 'for levels of 

objectives for Which there are no items, but for which items 

, need to be developed if the test is to be representative of 

the- range of content and skills instructed. |t is also 

• possible to coatinue this same matrix construction for 

-other content areas of a given course. For example, in 

the "Hydrology and Sedimentology" course, one could use the 

•same hierarchy of action verbs in a sequence of objectives 

across, not only the first unit in the course on open ' 

channel hydraulics, but for each of the other' five units .in 

the course -as- well. A separate matrix may be constructed fir 

each major topic or unit in the course. What results ^ 

matrix of objectives by test items h\f topics across tui 

course. In this j^ay, 'the matrix of specific performance 

objectives at variois levels by content or topic areas can 

be completely laid out. With such a structure it is much 

easier to map out a representative sef of test items capable. 

of providing a good assessment of the degir4e of learning 

achieved by students following instruction.^ Persons interested 

1. ■ - . 

-180-;- 



insuripq an appropriate - coverage of a,ll performance and. con- 
tent area? by test items, there remains still anoth^er taak • 
before th^ test is assembled.- it is necessary to insure 
that e^ach test item can be completed in a short period' of 
time.^ Aa was i,naic"ated in Chapter "7, test items and tasks 
must be abbreviated. situations which call for the 
performance capabilities required in applying cdurse content 
4.a.nd- skill to the solution of real pr6blems in "some coi^plex 
performance domain. The tes.t must always be 'much shotter 
than^ the time available for instruction and the time- 
available for formal instructio'a must always be .shorter-' thaij 
the time Available in the work setting for the actual use 
of concepts-and skills which" are .the focus of instruction. 
Unless one- is careful to abbreviate test tasks and items, " 
'one. ends up with assessments of learning which are very 
incomplete since all. of the available testing time i^ spent ' 
on the^ ccMppletion of one 5r two test tasks. If the^test 
.tasks are -complex and require a -comprehensive integration of 
course knowledge and skills, the test can be more 
representatiye of the performance domaiij^eing taught. % 
However, it is generally better to have multiple and 
independent indicators of learning outcomes by which to make 
inferences about Vper sons • competence in complex performance 
areas. , Thus many shortehed or abbreviated test «dtems are. 
.usually preferable to only one or two ilaijor and t£me 
consuming test tasks. One exception to this rule is the 

-182- 



laboratory practical" examination where performance's assessed 

■ by^directly observing the person design and conduct an ^peri- 

n/ent, construct a computer fJ^ogram, or analyze- the chemical 

properties of an unknown, material". However,, "for many courses, 

tests remain- the mdst practical ■ and ' 
.• • - . • ■ ^ ' 

, usef Jl means of estimation* of learning abhievement short of 

actual observation G;f on-the-job performance and the 

-products r^ultiag ftom this performance./ 

Chapter 7 outlined .anny procedures by which complex 

^ performance tasks cari be abbreviated into ef fiaieht- and • 
/ . • ■ ~ - ' •' ^ 

.brief test ^tasks or iten^g,. - Consequently, no more will" be^ 

said about, 'tlhis matter here. • . * '• 

Externally Validating Test Item^ and Tes'ts ' • * * . 

All of the previoti^ 'activi^^ies ar^dirfected' toward ^ 
developiTig tests .which are 'valid- indicators of the learning, 
.outcomes wh-ioh result from a course, ^ibwever, so- far, all of 
the .validatibn'prodedures are inteirnal. They are bdsed upon 
the^judgnjept' of the course developer and test item constructor. 
In actual practice the course developer and test item 
ponslxuctor tend to be the same parson or persona,'. This is 
perfectly normal and desirable. As is pointed o^t in Chapter 
7, test itenjs and instructional tasks are closely related. 
Both are ^rawn from the same domain of performance capabilities 
Test items should be rooted in the same array of content," ' 
knowle<Jge, and skill as are -the instructional tasks which are 

f -183- 



sel'ected for teaching a course. ' However before a test is ' 

« 

deyeloped for repealed and wide use with many replications 
of a course, the validity of th^ test fqr the per-forraartce •* 
domaii^ being dealt .with should be checked- by pjarso'ns 
external to the cotirae and i.ts development;. ^ 

There are two basic ways to , approach this -external 
validity check' Qf initial forms of test?-, which >ave iaeen 
developed.. The firht way involves identification of a few 
persons who are very expert in the content ^nd skilly of the 



. course.' The number^ can be quite small,, consisting of only 

•" « ' . I ^ ' ' ^ ' - . * ' .f 

. two or three persons. These persons can be given the 

■ performance objective by item by. topic matrix A's .well as 
the actual assembled .^fes^whicli h^s been.c'onstruotsed . « 
initially. In- addition those expert ^ should also be given 
a de^rfsxip^ion or the .teaching Methodology- and^a s^t of the ' 
course instjquctional maj;eriia«. ^Theee experts" can then be 
asked to examine all of these^matefrlala and make observjCtions 
about tTie adequa6y and acope of t^st itfems (and instructional 

r tasks and objecti-ves .^or that matter) 'in *terms of their bwn 

expert opinion af what is r6quir4d to exhibit skilled ' ^ ' 

performance in the content area under consi^^r^tia^ The 

. expefrtise oi tt^ese external reviewers 'allows tjiem t* make . 

., ' ■ • ' ' • 

' reasbhable- and i^ndepenAent. jud^rments about , the ade<5u«cy of 
-» \ . r ■ ..•-(* • 

test items* by whi<3h to make inferences abouj: a person's' 

ability in thQ content area under consideration, ' 



-184- 



ERIC 



20 i . 



There is another important reason to interview the 
experts who have' studied or actually taken the test. I'f^ the 
test items are tdo ^easy, and if, they represent only The 
basic and entry level skills requii;ed for skilled practice 
in. some complex perfortnance aijea, test score data from the 
qrdup of experts would by itself ^ not reveal this' weakness . 
Yet a dialoque with the persons completing the test would, 
almost Certainly do s6. * 

Another strategy is to locate a group of persons naive 
in the particular performance area for which the test has' 
been developed. These persons should have some common 
technical background and training compared to the persons 
who are expert >J.n the area. However, they should iack the 
particular training, 'experience, and background in the 
performance area which is the focus of the short cour:se for ■ 
which the. test is developed. This naive group, ca;n be 

administered the initial test* which has been developed for 

' ' ' t 

the course. Obviol^sly> 'the performande of these persons 

should be jxDor in terms of. mastery of the test material. If 

this group of naive persons performs at high levels on the 

test, it means that the test items are not a valid indication 

of what has been laarned^in the course. There may be problems 

with too many low 1«V|1 items b^ing included in the '^i^st^^ijr 

with other high level £fcems being constructed in such a \ 

* ^ I 

manner as to reveal the correct answer or the way to obtain / 
the correct answer. In such an event, the test items need to 
be rewqrked. Once again, interviewing of individuals who. 



obtained ;hiqh scores on the test, without apparent prior 

* » 

knowledge of the content .area , can provide much information/ 
about how to revise existing items or construct new items* 

, ^ m 

to be more valid measures of particaj^ar performance 
objectives. * ^ " / 

It Should' be emphasized that fclje numbers of persons in 
these groups of expert judges, expert test- takers, and naive 
test takers need not be large. If one has only two expert 
judges independent of the course developmeiifc^activity , only 
five to eight expert test takers, and only (give to eight 
naive test takers, much can be learned from the results which 
will improve the validity of t^e test being developed. It 
IS far better to use such' S|^ll and. independent samples by 
whigh to externally validate a test ^v^ch* is under j 
develoE^eht, ^specially if one interviews the persons in 

these groups after their activity, than to use large numbers 

• * 

of persons and rework test items only on the *basis of the 
test scores and individual item characteristics. Groups of 
these compositions and .sizes are very adequate to the task 
of identifying the more serious problems with test items. 
Usually any ^serious pro'^^Npma will be i*dentif ied by multiple 
PMfions in the expert gxoup even with only a few persons 
involved. 



Assembling Items Into Different Types of Tests 

The initial development of the levels* of performance 
objectives by course topics by test items matrix will provide 



much information about^ the general difficulty level of the 
tests. However, anoth^Jsorting of the pool of test items ■ 
into three categories can help in assembling tests likely to 
be effective in discriminating amon'c? different! levels of 
-achievement of learning outcomes. 

The three Levels* are found in ^Table 3 on page 161- In 
the first level are those items which represent knowledge and 
skill prerequisite to the course and its successful engagement 
Ijy the learner. The knowledge and skill represented in these 
test items is believed to be necessary to learning the content 
of the course. However, it is not to be fcaught in the 
course because».there is limited available time, much content, 
and ^students must be assumed' to have an entity level of 
knowledge and -skill in order to- proceed. 

In actual practice, persons who enroll in continuing [ 
engineering education courses vary a great deal in the degree 
to which they have mastered prerequisite knowledge and skill 
(Weisehugel, 1978). ^Therefore, in the interest of 
documentation of the growth of individuals' learning, 'as 
well as t^e average effectiveness of a course in improving 
the learning of groups of individuals it is necessary to 
have some idea of where participants actually entfered in 
respect to ex|5ected prerequisite levels of knowledge and 9 
ski~l4W=-. For example, suppose a course had been developed 
for teaching practicing civil engineers how to, use computer ■ 
simulations to test the performance of 'various structures and 

-188- 



systems they normally design. Also suppose -that the course 
was well designed and had the potential for being very " 
ef^ectivej in teaching pi^acticing engineers methods of 
gomput^f simulation by which to test the adequacy of complex 
designs." Eet us also suppose that the ^prerequisite skills 
require participants to be facile with Fortran or another 
computer language. In addition, "suppose the participants are 
aiso expected to be very familiar with the use of computers 
and computer terminals. Now, suppose 4 0 percent of the , 
participants for a given offering of the course' did not 
possess these prerequisites to a high degree. These students 
would Undoubtedly hav% difficulty in doing the course learning 
activities and would also be likely to perform poorly on any 
post te^^ which Was reasonably representative of course know- 
ledge and skill 'objectives.' If the success of the course 
were judged on the. post ' test scores alone, it might be seen 
as a poot course, or at least as not being effective for a 
large. number of persons, which, J.ndeed, it was not. However/ 
without some measure of the entry imrel s^ill and knowledge 
capability of the participants, one could, not be sure of the 
reason for the lack of success. The same result coul'd be ' 
achieved from poor organization df instruction, poor 
presentation of course material , a hostile attitude of the 
instructor, or distracting and inadequate conditions in the 
physical setting of the learning environment. 



- For thi? reasb„,^ it is general-ly wise to incorporate 
; some- items «hl=h meas^r. prerequisite sMUs and knd„lfd<,e 
levels in ^ests developed to assess learning oitoomes of ' 
courses; This «n often be acoomprijhed by a relatively 
few number of Items, perhaps as few as four or. five, each ' 
item carefully selected to require the performance of some 
specific prerequisite sMU or the recall of some sp'ecific 
procedure or information.. " ' . . 

. There are two ways to approach the inclusion of items 
Which measure biJI^^^^ISv^e and sWlls required fo'r entry 
into an4 complUion of course activities. The first method 
is -to produce a\st de'slqned specifically to measure pre- 
requisite kn,«,ledge and skill. This type of test cat be ■ ' 
mailed out to participants in advance or adminlstired in one 
eentral place ahead of time. The test can be scored and the 
■results used in an advisory i,ay to place a participant in a 
course appropriate to his or her needs and present leveL.of ' 
capability. The purpose of the test is to. screen and advise 
persons concerning entry into a given course. Persons scoring 
low on such a prerequisite, skills and knowledge test should 
not be restricted from entering the course in mbst cases." ' 
Rather, they should b^ advised that they would do well to not 
enroll in the course ai this time, .and also be advised about' . 
what they needed to do tt prepare for the course if they 
wished to enroll in the future.^ An. exception to this would ' . 
be in areas where entry into the course without the 

-190- . ' • . 



ERIC 



20!} 



prerequisite skills and knowledqe would be danqerous or' 
destructive to property and life. Until, a pilot demonstrates 
strong proficency on a wide variety of prerequisite knowledge 
and skill tasks r including written tests, .physical 
performance tasks in a Link trainer or some similar devi6e, 
he or she should not be permitted to enter the next phase of 
flight instruction, e.g., act^ial flying- Similar restrictions 
operate in the use of complex, expensive^ and po\entially 
dangerous equipment] as is often found in iaborator\es and , 
industry. \ 

Another way to provide information on the general ^^-^ntry ^ 
/level skill and knowledge of participants in prerequisites is 
to prepare comprehensive pre-tests. Under this approach," one 
prepares a test \::omprised mainly of items which are sampled 
from the domains of performance being instructed in, the 
course. Many or most of these taststasks or items would 
also require prerequisite knowledge and skill. However, if 
a person /missed these items on a test, %ne would not know if 
it was because he or she did hot have the necessary pre- 
requisites or if the individual had not learned or mis- 
understood something^ in the new knowledge or skill area being 
taught. The way to avoid this problem is to IxMiIude a 
small ^number of items on the comprehensive prertest which 
require only the expected prerequisite knowledge and skill 
for their correct completion, and nothing from the present - 

course being taught. Again, these items can Hsually be few 

-191- 



in number, being the same types of items as would be used in 
a screening pre-t^st, but serving a slightly different 
^.urpose. Here responses to these items would serve to 
reveal information about the variability of the entry -level 
skills and knowledge of participants actually admitted to|S ? 

' ■ ■ . - ^ ;;t 

course. The purpose would be 4io make better inferences-; alpout 
individviai ' learning and general course effectiveness. 

There is another reason for developing and using 
comprehensive pre-tests. In fact, when such a test has been 
(developed it becomes capable being used as a pre-test, 
a ^post test, or a delayed post test* This is because any 
comprehensive test includes items from across all. three 
categories shown in Table 3, page 161. Most o.f the items 
should be from the second category and be directly related 
*to the intended learning outcomes for. the course. These 
items should be tasks the learner can be expected to perform 
correctly having completed the course and all of its 
instrudtiohal activities. In addition, there should.be a 
small number of items which test for only key prQi;^e(iU;L«ite 
skills and knowledge^ for the reasons described a^ve* 
However,, there should ai-so i>« .« small numbei J3i4.^iit«ui from 
the third category, a sampling of test tasks whic^i the person 
cannot yet be expected to perform very well', after only 
having completed the course. These items should permit the ^ 
learneir to demonstrate skill and knowledge beyond that 
expected, to result immediately fromtthe completion of course 
activities. When such a ^comprehensive test has been assembled 
it:s use in the .role of a pre-, post, or delayed post, test 

~192-- 



reveals information about the tanqe of competencies of 

particij>ahts, which are often very variable, especially upon_ 

• - p. - 

entry to the course. - ^ ; 

As in the' case of the .inclusioh of items related only to 
prerequisite knowledge and skill, only a few items which go 
beyond the levels of skill and knowledge expected to be 
achieved* in the cours^ need to be included." The presence of 
these items on the test; allAws those persons who have high 
level's of capability to not be restricted by the\^est. - 
Inclusion of these moredif f icult items also <aakes the test 
useful as a delayed post test. It' Has been 'noted earlier that 
'for most" continuing education" courses in engineering, one 
should expect the persons _who have completed the course to 
learn even more after the short course is completed and . 
persons have returned to the work Betting to apply course 
/content. Nto be sensitive this additional .facility witl| 
the knowledge and skills of «je^ course and their appropriate 
application, some items of the more demanding type need to 
be included. It is also usually the case that some . 
pa°rticipants"will have entered with higher levels of know- 
ledge and skill in a particular cqurse than. have other 
students.' It is not uncommon for some students to learn 
more than might be ex|3ected from only what 'is .actually 

instructed. * 

In summary, a comprehensive ."test consisting of a sample 
of items from all three levels provides more information 



about the entry and exit levels of participants* skills and 

knowledge on an individual basis andvas^a means of" making 

infereftces about the effectiveness of the course in generals 

To properly discriminate among persons with differing levels_^ 

* * • 

of ability, a test must be composed bf an array* of items 
of differing difficulty. Including items from each of the 
three categbiries is one way which is "^likeiay to^l^dMbo an 
appropriate range of difficulties in the items assembled for 
' a test. , . 

When' comprehensive tests are assemb^^ed, care should be 
taken to have an item pool whi?h is several times larger in 
€erms of numbers of items than the length or the tests which ^ 
are; to be constructed • Suppose tli^t one ^yishes to 'constr^uct 
a comprehensive l^est of about 20 items*' in length *for a 
particular Xjurse*. Suppose that it is also d)ecided that a 
pr-e-test, -paat test, and a delayed pos^t test would be useful. 
In addition, four items are included ' to test prerequisite 

knowledge and skill. ^Fqur more items, which are tvery , 

. J, • ^ 

difficult and demahd learning be^yond the level included' in - 

" • / \ ^ • 

cQurse learning activities,' are also' included . Tlie remaining*' 
12' items are all relented to speqif ic^'pferformance objectives 
taught in the course . If ^fhree such parallel test;s were to 
t>e constructed, it would be necessary to ^ave a m-fnimum of 
three times these numbers of items in tlie item pool in each ' * 
Category. Thus there should be 12- items of the prerequisite 
knowledge ar^d. skill type, -|6 . items .related to specif ic' ooojrse 



'4 



oerformance obiect ives^^nd^T^ i*ten<?'^relate;l to transfer of = 
course kriowledqe and |,KiFl- to '^mcfi.^ ccmf>Vefx nroblem-s not., 
dealt with, in thg courfg^^r^S^H' miqht be reasonably' ' 
ex-pected to be samples of ag4^^^^,a^pli(^ations• followim 
-course activiitiSs. 



One would then construcl\. H^rjb .elradlel' -t.est forms from'*"' 
this array of items. Firstf Q^'e-oiS^tl^e". t^f ee parallei, items 
from the prerequisite knowledQij»'.iiat3 " sjcill .item pool for a 
qiven prerequisite would be rancJprtily assiqned to each of the 
three forms ''C^ the test.' ^ This ^irocess wouW continue until 
each qf the triplet of items for^ each of the remaining three 
prerequisites. was assigned randomly to' one of , the three forms 
of the test. The same procedure would be repeated with the 
12 triplets of items in the performai>ce' outcome item pool. 
The procedure would be repeated^x again for the four triplets 
of items in the transfer pool.. What .woul.d result would be 
three parallel forms of "onfe'test. Any form dfx the test, 
could be used in any of the three rples^ of- comprehensive 
pre-., post, or delayed post test. . To the degree thafe^the" 
tests were actually empirically as well as conceptually 
parallel, a set of three tes€. scares Across an individual 
would determine his or her ent^ level", exit -level,, and 



subsequent improvement or decremeh€ In knowledge and. skill in 
course cqntent at- some time af ter. .th6' completioii of the 'S' 
course. Over replications of courses ar/d "random assignment 
of persons to test forms the statistical siaiii^icance of each 

-195- 

■ • . 1 ■ ■ : \ 



form of the test can be empirically determined by methods ' 
; similar to those presented in Chapter 13. if the test forms 
are found not to be equivalent, adjustments can be made in ; 
particular items on various formg to produce parallel- items * 
and tests. 

o 

>, • It is much more likely that parallel items and tests will 
be developed ff one begins with a listing of specific - 
, perfonpance outcomes^ as is shown in Table 4, and if a task 
•analysis procedure has been used to generate the performance ' 
• Objectives for the course. , For short courses this^ -specific 
mapping out of the key performance aspects of the tasks to 
be taught and the uae of action verbs to operationalize both'' 
the instructional tasks and the test items is a sound 
apprS"ach. Each action verjki and performance description 
P?:ovidM..(See Table 4)' can- easily produce a number of Items 
•identical- with respect to phe performance being tested, Sut 
. unique in term^^drthe specifies of 'the problem s-ituation. . 
Each of the parallel items thus produced may then be faiidomly 
assigned to any tegt function or form. 

The actual degree of. which test forms are parallel can 
.be determined hy a variety of means, the most commo*n ones 
. being based on analyses of group means and variances for 
comparable groups of par<,lcipants for given roles of the 
test, e.g., pre-test, post test, or delayed post test. There 

two -e^^ays; to do' this. Over subsequent' replications 
ot a course, ^ne can use randomly each form Of the test in 



-196- 



a different role. Thus for the first group, form A can be 
used as a pre-tefit, form B as 'the post test, and form C for 
the delayed post test. For the next replication of the course 
• the order can be reversed, 'still later, replications can be 
used to a§siqn form B to the pre- or delayed post test role. 
If it can be assumed that .the gcoups'of participants are 
comparable, the , means and variances of any form of the test' 
ought not to be statistically significantly different from 
one another .when used in any particular role. This should 
be especially true for. the post test role where participants 
immediately following completion of the course ahd' having had 
a commbn developmental or learning experience, are likely to 
be* most alike. 

An alternative is to use allithree forms of the test in 
all three roles for each course replication. That is, one 
thied of the participants ra-ndomly are assigned form A for the 
pre-test; another third, for;n B; and the remaining third, form 
G. The same pattern is. followed for assignment of persons in 
the -course for the post test with the constraint that no 

individual may be assigned the form he or she was previously 

r ■ 

assigned. The delayed post test assignmfents are made in 

i 

the same manner. "Again, if there are no significant - 

statistigil differences between the means of the three forms 

of 'the test in each of the three roles, the test forms are 

supported as being paraljel empitically as well as 

conceptually. , Under this second plan, one might have to 

-197- 



accumulate data aver several* replications of a course to 

have enough persons' scores on 'each form of the test io make 

Strang inferences about the parallel nature of the 'tests. 

Any embedded test tasks which are to -be used ought to 

be anbthet set of parallel Items, not'^th^se used for the 

construction of the pre-, post, or delayed post test. One , 

can economize someWhat by using the pre-Jbest items as embedded 

test items and reserving the other ^t^wo forms of the test for 

> 

the post and delayed post test.. ?*re- important condition to 
maintain, is to keep post tests independent from the embedded 
tes^ tasks. The latter are primarily used to teach the 
student and help the instructor guide practic**activities for 
,the student, while the post and delayed post .tests are 
designed as an independent assessment of learned capabilities. 

Conclusion * ' 

It should be apparent that one does not go through this 
length|:y process unless it is lilc^ly that a given coQrse will 
be developed and taught often and unless there is sH^ng \ 
interest in determining the learning outcomes resulting ^from ^ 
the course by estimates using test scores. If - the course is 
only to be' developed and tauqh't once or twice, or if- there 
a^re, other goqcJ indicatots of the functional capabilities of 
participants after the course completed, it would make no 
sense to .spend the ef/ o.rt -ipvolvejl in developing comprehensive 
pre-tests, post test^,' and del|iya4 post tests. 



chapter 11 . 

. CONDUGTiNG ITEM ANALYSIS. AND TEST RELIABIJ.ITy' STUDIES 

-Thus far, all of the. test development activities des- 
-cribed in earlier chapters have depended upon the knov^edge 
,;^f the cour^^instructoEs and some other persons expertf ln 
• tjie content ^f the course. If the steps^ot^tTin^^V^^e^iously , 
are followe^,^ is iiikely Wap tests vrili be deveioped which 
^re-good estimates o'f the degree o,f learning which raWts 
•from a given cou^;9e^^^ ^ , ^ •' ! ^ .' v 

■ *Iowevet , there' are^well esl^abl istied tes f.^'eS'. 4na'l,,y s is , 

""^^ — — '^'^ ' • ' • ^ ^ ', ' 

me^ods>hiyh\|cW bi^used t^ calculate the' difficulty leYel Z 

. of given -4feem^ni^aIso\^.^bilit^ of, items tb discriminate . 
among petson^^wiwl^jHf^stand tlie content af the cour'se' and 
thb^ v^toVdb 'not:. ^-diffeiculty, index''a?:n »' calculated « for, 
each item as can a .df'iscf imi>nation index. These stati9tic^ 
provide empirical information aboV, the behavior .of the (items 
in the tests which .have «been assembled. This empirical ' 
information-, coi^led with the information "derived from the 
logical development and- analysis of items ^desdrilied in jftarlier 

^hapt^rs, be upe^^ to re\ypite a;id adjust test items to . 

achieve optimal levils of difficulty* and discrimination. 

In addition, t:here are well developed methods for the 
\ ' ,r ■ 

empirical ^stiraation of the reliability of. -a tesf, i.e., 'the 

.degree to wK^ch the test consistently measure's the presence 

Q.f given levels of ability . in persons from trial to trial. 



If tests are not reasonably reliable^hey are poor estimates 
of tjie degree of learning which has occurred. Consequently 
it is important to determine the reliability of tests cjnd to 
modify tests to be more reliable if they are not so initially. 

Only a btief introduction to some pf the more 'common 
methods by whiph to c^arrj^ out item analysis and reliability 
Studies ip presented here. The books rfi^*enced later in this 
chapter provide .detailed* information- about these procedures, 

' ' c ' I. 

Before proceeding, it is important to realize there is 
a problem with the computation of item difficulties, 
discrimination indices, and the reliability of tests by 
traditional^ me'ans because of the nature of the tests, which [ 
-have been described in earlier chapters. Most of the standard 
procedure's > for carrying out item analysis and test reliability 
Studies have ^een developed for norm referenced testing 
approaches. Yet, this hoo\ descrdbes and recommends a 
criterion referenced testing. approach for developing tests 
hy which to assess learning tyitcomes of continuing engineering 
education. This means that typical item analysis and test 
reliability. estimaCion procedures/ while useful to the task 
of test construction, must be modified.' It also means that 
persons using these methods need to be clear about, the origins 
and a ssuiiptions -implicit in the|i. Af ter examining thi's 
prpblqm, suggestions will be made for the use of existing 
procedures for conducting item analysis and test reliability 



'studies in a aanner consistent with the purpose of testing 
as it occurs in criterion referenced situations. 

Norm Referenced Testing Procedures ^ 

^ There are two general approachei to testing. One is 
called .the norm 'referenced approach and the other the criterion 
referenced approach (Airasian & Madaus, 1974; Bloom et al., 
1972; Maratuza, 1977; Millman, 1974). 

The norm referenced approach is based upon obtaining a 
representative sample of persons from some population of 
interest. Tests which have been developed are administered 
•to. this sample of persons. Judgments about the adequacy of ' 
an individual's performance are made by reference to the mean 
performance of the group .and ''the observed standard deviation 
for the sample. Individual persor^s* performance scores on 
the test are ranked with respect to one another. "Passing" 
performance is defined in terms of falling within some range * 
of typical performance around or above the mean score for 
the group. Arbitrary criterion cut off points are used. For 
example, it is sometimes asserted that any person whose 
score falls below the 50th percentile will not be admitted to 
a program.. Sometimes the cut-off point is- determined in 
terms of standard, deviations so that persons whose scores on 
the test fail below three-fourths of a standard deviation 
below the gVoup fnean are judged as not knowing en6ugh to 
have passed! the test*. 

■ ' ■ 220 ' 



Norm referenced approaches are conunonly used in the 

construction of standardized achievement tests which are 

used for making decisions about admission of students to 

academic prbgrams. -Group intelligence tests are also nqrm, 

referenced. Most professional licensing and certification 

examinations are also norm referenced. Thus, in a given ' 

administration of a professional engineering licensure 

examination, a certain percentage of the persons, who take the 

test may be expected to fail the test because success is 

defined with respect to achieving a score no lower than' so 

many standard deviations bellow the mean of the group taking 

the examination. BecaiUse all test administrations to- samples 

of persons result in some yi»riance and because that variance 

can be used to define a mimiaum passing score which falls at 

some arbitrary point below the mean, no matter how selective 

the group being tested amd how skilled each person is in the 

knowledge and skill being tested, some proportion of the 

persons. taking the test, will fail by definition. 

'The norm group for a norm refere^jced teat may be only 

the petsdns takirtg a particular test at a particular time 

for licensure as an engineer. In practice, the norm .reference 

I _ « 

Igroup ought to be much more inclusive. Norms for well ^ 

developed achievement tests are usually based upon national ^ 

randpirT samples of persons from the population of interest, 

Ip'^ormat ion about an individual"s test score can be inter- ' 

preted with respect to these naiiional and regional norms" in 

-202- - 



terms of rankinq the aompetence of the individual in the test 
content compared to these other ' persons • 

A much Ifess adequate version of norm referenced testinq 
has also been the most- common practice of professors in 
engineering, scientific, and technical -fields. This has 
often been referred to' as "aradinq on the curve." .'in this^ 
approach the professor constructs an examination based on 
^the course content. The test is then administered. Grades 
are assigned by placing a certain percentage of the top 
ranked observed student scores in the A category, the next 
group of scores in 'the B category and so forth ointil a 
certain percentage of the lowest ranked scores are placed 
in the F or fail category ^' Criteria f|)r determining these^ 
^ grade assignments ma^ be based uT>on simpl^ rank order of 
scores, percentile ranks; standard deviation units above or 
below the mean, or other ^similar procedures. Whatever the 
procedure there are often serious ^ pro^ ems with this norm 
referenced approach based on the performance of the students 
' iiKpnlya particular classroom, ^ 

First of all, most groups of students in engineering 
classes afe highly selected with cespeettto their prior' 
knowledge and skill which is r^uired fiarf*^ entry to the 
program and successful participation inXthe class learning o 
activities, ^his is ^o especially for the studei:its in ■ 
advanced courses. It is also true for many studei^ts who are 



professional engineers enrolled in continuim 
education courses . 

^ , Suppose that a group of students in an advanced, level 
.engineering course in a technical a'r^ is already highly 
skilled and knowledgable in the prerequisites to the course 
activities. Suppose that, in. addition, these persons are 
also highly motivated to learn what* the course has to offer. 
Also suppose that the course is well organized and' the 
instructor is quite effectice in his or her teaching ♦ Now, 
let us suppose that the instructor prepares an examination to 
test the students' knowledge of course content. Let us assume 
there are 16 students in this course. We will assume it is 
a well designed arid reliable examination. L^t us also assume 
that most students have worked hard and have, indeed, l^earnerl 
mo'st of the content and skills of the course. Each student 
completes the 25 item examination. The test consists of 
some very difficult items, a few easy items, and cither 
items of moderate difficulty for persons of this «ieneral 
ability level. The mean^ score for the group is ij , and the 
standard deviation is 2.3. Would ifmake sense to. fail those 
students whose observed sco^e on the test dappened to, be 

ranked lastoor happended tci be one or one and one-half 

/ . 

standard- deviations below the class mean? What would be the 
meaning of the grade assigned to x, each student?' Suppose the * 
next semester the same/course is taught in the same way by 
the same instructor to another group of 13 students • The 

-204- 



saniG cxamin^^n is administered. This time tJie mnan Cor 
the group is 13.2 and the standard deviation is 4.4. Ond"e 
again tlie instructor assigns grades by rank ordering the 
students' observed scores , in the class, ne uses the same 
criterion of so many standard deviations below the mean as 
indicating failure. What do the set of grades i^n the second 
- class mean? In particular what do the sets of grades fot 
persons^ in ^ the two classes mean with respect to one another? 
Mot very much I In both ^ases the normative reference group 
is non-random, non-representative of the largepr population 
Of persons at that level of development and expertise, and 
'too small and truncated; 

Norm referenced testing procedures make good sense -for ' 
determining the skill or competence of a- person in comparison 
to •other persons In, the general population of interest. As 
such the nC>rm reference^ approach makes sense in the 
development of standardised Achievement tests based on 
national regional sampled of the population of interest, 
insuring that persons' from all ability levels have^n equaL 
opportunity to'^be sampled in the norm group. Without random 
sampling of persons across ability levfeii the individual 
score of a person cannot rank him or- her with respect to the ' 
distribution of knowledge and skill in the population to 
which he or she belongs.^ 'This .property of norm referenced 
tests makes them very useful foic.. stamiardized achievement 
tests and vei^y inadequate as' a means of estimating the degree 



of specific intended learning outcomes for courses. Because 
of these pr6blems with norm referenced testing approaches an 
alternative has been developed which is much more appropriate 
to instructional settings for the -estimation of the success' 

4 

Of instruction in atkieving specified learning outcomes by 
individual students. 

...... 

t eri^n_Re ferenc^^ Testing Procfed u res 

Particular courses have specific intended learning .'out- 
comes which are usually mu6'h more precisely and narrowly 
defined than the domain of knowledge and skill typically 
tested for on a -standardized achievement test. In a \ 
particular course, the instructor usually 'wants to teach 
some finite number of specific facts, concepts, principles, ' • 
and procedures as well as skill in actually applying all' of- 
these to the solution of problems faced in the real world 
work situation. This is the case ' especially for continuing; 
education courses for practicing engineers. There is little, 
interest in comparing the-perf ormance of individual students 
who have ^completed the course to one anotl^er or to some 
external normative reference group. Therd is much interest 
in comparing, the performance of students on tasks typical^ of 
those facdd in the work setting with acceptable standards of , 
safe and informed practice.. The student is expected to learn 
the appropriate use of knowledge and skill acquired in the 
course in order to" exhibit performance of some particular 

-206- 



tasks at a criterion of mastery r This approach to testing- 
has come to be known as criterion referenced testing. 

In criterion referenced testing each' person's " 
performance is compared to some standard of mastery or 
competence- in the actual performance of tasks' directly related 
to the knowledge and skills taught in the course. The 
criterion for cdmpa^ison is hoj^^well individuals are able to 
^ certain tasks which are specifically sampled to be . 
representative of the range 'of typical tasks being instructed; 
These selected tasks are similar to those assigned for 
\ practice during the course of instruction.' The entire 
emphasis ig upon the degree to wlaicl^ each person, having 
reeeived instr,uction, is capable of ^hibiting competent 
performance on specific tasks similar td> those used, to teach 
persons in the course but never Toefore encountered in that 
particular configuration. Testing tasks and" instructional 
tasks are parallel..' Both are focused on specific performance 
capabilities which are seen as thes. purpose of instruction. . 
There is little or no interest in grading persons with 
respect to one another, only in estimating how well persons 
have learned particular complex performances. ThLse 
performances are usually expressed in action- verbs' as noted' 
'in Chapter 10. They are operational and observable. The ^ 
acceptable level of -'"correct" performance 'is defined- as \ 
mastery. The mastery level is determined hy the actual 
^degree of correct performance which- is possible Or required 

. * [ • ■ v,. , -» 

' -207- , <^ 




■'-a^ljfr^ J^ric;t4:qr^^ lev^- in' the real domaith of 

actual .%ri^^;^y^^^^,?ic^^^^ 

performance>.bas^;^^jrigine§r^^^ <Grog«n\? - 1979)", self- " 

paced teaching 'mpAti^^jA^i-^s^^ii ; 1 976 ' thelVei^onalized' 
system of instruii%i&y'.(kuriH'^s^^^^^ 

similai? ■ap{)jc9a-ches\aire\^ oriented.' All use 

criterion referenced tes t;iQ<j^^. Approaches . These methods haVe. 
been widely used;;at^- -hayi' iaeiin" shown to be very, effective. 

It should be clear ^ the reader that the slaggestipns 
and guidelines which have been presented i^ previous ^chapters 
are' directed toward the criterion referenced ^approach to 

r • 

testing and assessment of persons' functional competencies. 
The^ problem is that many of the procedures which have been 
developed to determine item difficulty, discrimination, and / 
teat reliability have been developed ujider norjnativ^ testing - 
approaches for the development of very Inclusive and non- 
specific performance tests which test for very broad domains 
of general achievement. - , ' 



Item^Dif ficultv and Discrimination Indices for Norm and 
Criterion Referenced Tests 

Under norm referenced approaches to testing a commbn 
rule of thumb for determining item difficulties iS to dividd 
all bf the. per sons who took the test into at top quarter and 
a botto|| quarter group' with respect to observed . scores. 
Sometimes if the total group size is sitiAll", the. top and bo^:tom.. 

-208- \ r 



.r 



ERIC 



227 



tihird's oT the s6ores are used. An index of item difficulty 

is calculated in a straight forward mnnner ' ^ 

. . y • 

Proportion of ^ Proportion of 

NR rtem high scorers correct + loV scorers correct 

THf-ficulty ^ i — > 

Index 2 

The difficulty of each item is' c^ilculated in this manner and 
.the average item difficulty of the test itenis is, determined. 

The propet'ties of reliability formulae and long .experience hav 

, . • I ' - *. 

shoWn that tests with average item difficulties of .5 produce, 

the optimal range of scores and result in optimum ^reliability 

of the test. Average , item d4fficulties of this magnitude 

are ideally 'suited to the task of separating people and * 

ranking them i^ terms of , t^t scores obtained. In , actual 

practice test items which have difficulty indices ranging 

J from .25 to .75 are desirable for this separation and 

ranking function, In norm referenced testing i^des'irable 

to end up with a test which has a nice Hange* of item 

difficulties with the mean item difficulty being about .3. 

If the -item difficulties are 1:oo. high, the test will not 

permit *students to demonstrate what they 'know,^ the scores 

will all be lumped together at thellHiw end of the saale, and 

students' test scores will not be !^itable for' s.eparation* and 

ranking. If the item dif f iculties^are tobTow the test will 

not assess what students do now know, the scores wij.1 be 

all lumped together at. the high end of the scale, and, again, 

not be suitable for separation and rartkihg. But ranking in 

ses'^ect to what? With respecl: to other persons, of course', 

-209- . . .. / • ' 



22b 



for that is the matteV of interest in norm referenced 
approaches to test (Construction. \ 

In a criterion referenced approach there is little or 
no interest in comparisoji or' ranking of persons' scorfes with 
one anothej:. There is strong interest in ranking or 
categorizing persons' scored in relation to some criteria 6r 
standard of competent performance on some sample of tasks 
similar to those used in. teaching the knowledge eLd skill. 
Typically the criterion is arbitrarily established and called 
■ a "mastery" level. Mastery is usually defined in terms of a 
certain percentage of correct performances over trials or 
tasks. For example, the criterion; for a mastery of a course 
might be absolutely correct performance on at least 80 per- 
cent of all test tasks given on any particular test. Less 
than 100 percent correct performance on at least 80 percent ' 
Of all test task^- would be viewed as failure to achieve 
mastery, while 100 p^ent- correct performance on 80 or a \ 
greater percentage of all. the test tasks administered would ' 
be viewed as demonstration of mastery. ^ 

In the criterion rjfer^nced testing approach it makes 
little sense to compute item' difficulties in the traditional, 
.manner. A better procedure is to atjminister' each test task 
to'group^ of persons who have not .completed the course of ) 
instruction and who should be naive in the skills and ^ 
procedures being taugh^. If this is so, these persons should 

perform poorly on the test tasks or items. However, a second 

-210- 

220- ' 



/ 

aroUp of persons who have completed the course, and/or who, 
independent o^ course completion are known to have masterel 
the knowledge and skills which are the intended performance , 
obiectives for the' course, should perform at very high levels 
of masterv on each test task or item* Thus, one determines 
item difficulties by administration of each item to such 
groups and recording the freQuencv of correct and incorrect 
responses to each group. Items which discrim'inate between 
naive persons and skilled persons should be clearly apparent. 
The naive group should have- very high freauencies bf error 
to each item and the expert group very high levels of correct 
or mastery level performances. If these results are nbt 
observed/' the test items or tasks need to be (modified. 

It should' be noted that this is precisely the procedure 

< 

suggested in earlier chapters as a method of external 
validation' of a- performance task. The 'same procedure is 
calrried out, except that data are tabulated on each item for 
both groups. Items which do not discriminHte^between the 
naive and expert^'^^^ps are eliminated or modified to insure 
that they do discriminate. Under driterion referenced 
testing procedures and mastery learning approaches, pre-tests 
on course knowledge and skill ought to have very High indices 
of item difficulty when calculated y/n h\e traditional manner. 
Post tests or delayed post tests pught to have very, low item 
difficulties when calculated -in the traditional manner. 

-211- 
/ 

23U 



• A discrimination index is typically computed for each 
item under the norm referenced approach. This, as with the 
difficulty index, is based on a ^mpari«on of scores' of 
persona in the top- and bottom extremes of the observed ranqe 
of scores. A rule ,of thumb^for calculation of a simple item 
discrimination index for each item is to subtract from the 
proportion of high scqrers in the top third 'of the total test 
score and who c;ot a giyen.'item correct, the proportion of ' 
low scorers, in the bottom third of the total test score and 
who also got that item correct^. This is pepeated for every 
item. That^s: \ • * * 

NR Item , Proporti^on of ' ^Proportion of 

Discrimination = hi^h adorers who - low scorers who 

Inaex - were correct were correct 

If an item discriminates well it- has a high/and t^sitive , ' 
value appr(iachittg th^ limit of +1. if an ^era' discriminates • 
not at all of poorly it, approaches an index of 0. If an 
item is so poor it discriminates IVi a reverse manner such 
that petsons with high>pores on the total.' test consistently 
"get th6 item wrong while persons with low scores on the\otal 
test consistently get it right, the discrimination index 
approaches aanegative value of -1; 

Again, in, criterion referenced testing approaches, 
discrlmlnatlort indices are more frequently determln«a from 
examination cTf the results of naive groups 6« persons' 
performances on given items, and t\e performances on the same 

r 

items by groups of persons 'expert in the course content. If 

-212- 



an item discriminates well the naive group should consistently 

r 

get the item wrong 'and the expert group should .consistently 
get the item correcjb. * ' . ' i . 

Calculation of Item Difficulty and D iscrimination Indices for 

CrlteHon Referenced Tests - . . 

i . 

Actual Indices of difficulty and dlscrimlnatlpn of iteins 
may be calculated for 'criterion refere^nced tests by means 
similar to the traditional methods. In the case of item 
di^iculty one can add the proportion of experts who got an , 
item correct to the proportion of naive' persons who got the 
.same item correct and divide by two. That is: ' < . 

Proportion of experts*! Proportion of naive 
with correct ansWr + persons wij:h correct 
to an ' item <2> \ ' answer to an iteip 



CR Item 
Difficulty 
Index 



, 2 



Once again it can be seen tlhat jthe' optimum average 4tem 
' difficulty for the entJre,teat is •Si \This val^ae l^ndicates \ 
that, on the average, the naive group of -utiinstructed* persons 
was riot able to perform^ the test taskSTbiit th^t persoits 

known to be expert in^he. course knowledge and skill were^ 

7 . ' ' ' * • V 

able to perform consistently at a njastery l^yel. If one 
devfelopa such a^test it can be used tjp ^measure the. <?ffeGtiveness 
^ of instruction of a given course in, bringing the studehts 
enrolled in the course to. levels of master^y^. -In addition, 
within the limits of error of ^ measurement and the Validity 
of tfhe test tasks, each pexson who has completed thfe -course 

^ ^ -113- / ' 



ERLC 



^ %J 4^ 



may be^ certifed as havinq> mastreregi not mastered the 

knowledqe and skills whic.h we:^e the intended performance 

outcomes for the colrrse. . . 4 /\ ^ ^ 

In a similair manner the traditioijal w«y of calculation' 
j ^ . . ' • 

of item discrimination indices may be modifie^t to calculate 

a discrimination index for each item for»a criterion - 

referenced approach^ In this case 

CR IteoTi Propartion of experts Proportions of 

^ Discrimination = who had an item — , -vnaive persons 

Index correct - who have an 

' , ^ item correct 

J[f an item discriminates perfectly between .these two qroups 
±t will have a value of +1.' All the experts will correctly 
perform on the^item and^ allt of the naive persons will pjsrform 
incorrectly. If the^ proportion of .persons in each group is 
the same in terms of correct performance, the item will not - 
discriminate a£ all and the value will be jsero. This can 
happen if the item is. too*dif f icult so that everyone in both 

^groups gets it wrong dr if the item is so easy everyone gets 

it' correct. It can also happen if .the item is unrelated 00 

invalid with respect to the content arid skill of the 

« ♦ ' '' . 

performance domain. In such a* case, correct and incorrect 

answers might be randomly distributed across the expert and 

naive, groups in equal propprtions. Agaii;i, the item would not 

jdiscriminate between the groups. .Therefore, the item should 



Tdb removed or modified so that it will discriminate.' 



As in the case of the norm 'referenced app^ach to 
testing, it is possible to have vitems which discriminate in 
a reverse manner where the expeqj^ qroup perform^ consistently 
incorrectly a'nd the naive group performs consistently 
cdrrectly. Such an item, is poor and will have a- 
discrimination index approaching the value of -1. 

Item Anal ysis Procedures in Perspective 

In actual practice as tests and test items are developed 
it is a good idea to ^calculate both the traditional norm 
referenced and the criterion referenced .indices of difficulty 

and discrimination. Detailed procedures more sophisticated 

* ft 

/ b 
than those presented here may be found . in Maratuza (1977), 

particularly in Chapter 17 which .deals with criterion 

, referenced testing. There are also malny existing computer 

programs for the routine calculation of these values along^ 

'With estimates of test reliability. The values which are 

obtained from such ^analyses of test i4:ems are useful in* 

revising tests to be more valid and to better discriminate 

between those persons who have learned the intended knowledge 

and skill taught in -a course land those who have not. Should 

the director of a continuing ehgineering ^education program 

dtesire assistance ir|^ these ihatters , e^^pert' help .is usually 

available in most university testing, counsjsling, and 

t ^ - . ■ 

"computing centers. Persons working in these centers are 

usually, facile in these procedures. Other 'persons with high 



levels of expertise in this area of methodology are typically 
found in educational psychology, psychology, and behavioral 
science departments. , ^ a ' 

It should be pointed out that the actual calculation of 
item difficulties and discrimination indices along with test 
reliability estimates is useful . in, the task of making better 
te9b tasks and items. However, tl\pse procedures are not 
^ useful by themselves. The procedures outlined in Chapter 10 - 
about how to define performance objectives, sample instructional 
. tasks within thesfe performance domains, and conduct' external 
validation of these instructional test tasks are- more basic 

* e 

and prerequisite to good tests than are formal item analysis 

and test reliability studies* Furtharmore, 'all. of these 

• ' i ■ ^ . ' ^ 
earlier procedures are best carried. out by the persons who 

design and teach the courses lith little assis<^nce from 
persons expert in test construction." Serious attention to » 
• the design of good test items- and tasks- in these other areas 
• does much' to insure that the tests which' are developed will 
be sound. 

< 

Methods o f Reliability Estimation? The.NR and CR Cases 

The-^samp problem exists for the calculation of test 
reliabilities as exists for tlie "calculation of item difficulty 
and discrimination indices. Most of the' traditional 

procedures-iire~baa^ oh^ norTOi"l:eference{f"^st^ing procedu^fes • 

, and were developed in the interest of producing highly 

• . '■ 
o o — 



reliable standardized achisvement tests. As in the case of 
item difficulty .and discrimination indices estimates, the 
traditional procedures need to be modified when caiculatihg 
the reliability estimates of criterion referenced tests . In 
this section the' traditional procedures will first be 
described and the modif icat j^ons necessary for criterion 
referenced testing approaches will follow. 

There ai:e four common methods of estimating the 
reliability of a tradftional' norm referenced test. These are 
the alternate forms method, the -test-retest method, the 
subdivided test method, and internal consistency methods 
(Nunnally, 1972). AIL of these methods provide estimates of 
the stability or consistency of^the mental measurement ' 
achieved by the total test sdore for individuals. 

( • 

Aj^t ernate Forms Method : \ = ' 

The alternate forms method requires the existence of two 
or more -forms of tke same test. All forms should' be parallel- 
to^ne/ another with rdspect to the knowledge and skill being 
meaaured." The reliability of the testCs) is estimated based 

on the correlation of the scores .of the same individuals on 

• / 

alternate forras\.o£ the -test . The procedure requires the • 
same group of persons to take both forma of the test, /- 
-prefer-abty at t^her^e-;ttmfe^-to--avoi^ bTiWge's "I^^^ due/ to 

experience or athelr factors. Each person's score is 
determined on each of the two or mor« tests. The correlation 
O S -217- 



coefficient is computed based on pairs of persons' scores 
across alternate forms of the test. 

V . 

The alternate forms method of estimation of test 
reliability is a very good method for norm referenced tests. 
It measures the sources of reliability related to the errors 
in the sampling of tesj: items for the two tests from the 
large' knowledge domain of interest. It also measures the 
reliability of the tests in relation to errors due 'to the^ 
fluctuations of individuals ' performances over subsequent 
administrations of the test. 

For a criterion referenced test, the alternate for:ms 

method is not a good procedure . ..This is particularly so if 

. / ■ ' . %■ ^ ^ , 

the criterion referenced test is designed to determine 

/ • ' 

mastery- level of the content area fpllowing^instructioTi, which 
is usually^ the case.. The problem is that both of the alternate 
fonfcs of the criterion referenced mastery test will show very 
little variation across persons who have c(^mpleted the course 
and mastered the content. Correlation procedures are- based 
upon fittilng a line to a set of ;.p»ired coordinates 4n order 
to obtain information about the relationship of one set of 
scores to the other set' of scores,^ Variation in' teBt scores 
is -required for this to be a meaningful procedure; " 

Suppose ^hr^e alternate paitallel forms of a criterion y 
referenced test were developed for a short course for 
engineers. Test forms A and B ware administered to six 
engineers after they had completed th^ short course, form h ' 
- . ' -218- , / / " 



was given immediately after the course and form B was 
administered the next day^ Form C was given as a pre-test 
before the course was underway^ The results of these 
administrations ar^ shown in Table 6, The means and standard 

/ deviations. for each of the three tests are -also presented. 
In addition, the observed minimum, maximum, and average 
mastery Revels of the' participants are presented for each f 
test administiration. 

^ The m^ximum^ score which can be acquired pn any test 
form is 60 raw score points. Inspection of the scords of 

^ individuals on forms A and B administered as post tests 
reveals thsit persons are performing at high levels of 
mastery. Inf fact, the lowest observed post test score of 53 
is at, the 88.3 percent jnastery level 'on form B.^ On form A 
the lowest observed score is 55 at a 91.7 percent mastery 
level. The average mastery level on form A is 95«8 percent' 

^and for form B, 34.2 percent. Clearly, frpm a I'^ical 
standpoint, two tests have bei^n developed which function very • 

^^^arly.the same. Both' indicate, to a high common degree, 
that the six persons completing the course have' mastered the 
material. Further support for this, conclusion is found in the 
pre-test results for form C. The pre-^est scores are those 
of f. naive^ gnoup, not yet instructed* in the course cofttent. 

-These^-scores are ver-y-law-^^at^-or-^beiow Ja 35 percent mastery 

.level in all cases. Sup^se both. form A and form B were ,used 



Table 6 
irs 'on A] 

Three Criterion Referenced f;lastery Tests* 



Scores of EncTineers on Alternate Fotms of 



Form A , * Form B Form e 

?i^^qn Post JTe s t ^Scojre Post Test Sc ore Pr^-T£6t Score 

1 S6 60 ' 15 . 

2 » ° 60 ^ 53 ■ • .12 

3 " - 59 . , * 58 , , 18 

. ^. . . 55 ■ .53 , ' / 10, 

f 

5 ■ 57 . • 60 ' . '21 

.6 ^ 58 55 : 7 ^ 

> - , . • * I 

* ' 9 

Mean Raw . . > - ' ^ ♦ 

•Score 57.50 . • 56,50 13*83^ 

•Standard . " ' ' 

Deviation ^1.87 ^ '3.27 "^5,19 

Minimum / . * * , / * 

Mastery* {%) n.7% ' 88.3% 11,7% ^ 

Average _ ' ^ 

Mastery {%) 95. .8% • 94.2% 23.1% 

Maximum . . 
^Mastery {%) 100.0% 100.0% 35. P%* 

> [ ^ N ' 

*A11 scores reported in raw score units unless otherwise 
noted. Maximum possible score on any form of the tes.t is 60 

For the relationship y = mx +f b for Form A and B scores, 
m = -0.31, b * 74.57, and the Correlation coefficient 
between forms A and. B is.-0/.18. 



^ as a pre-test for other groups an^ that the results obtained 
were very much the same as those obtained when form C is * 
used as a pre~test. * Suppose that form C was also used as a* 
post test for s^e of tha other groups and, once again, the 
results obtained were very similar to the post test results 
for forms A^d B and for the present example. On the basis 
of this information^ it would be reasonable to ^onclude that 

' • • ' i 

the forms A, B, and C as post; tests are reliable.' They 
produce highly similar and replicable results. This 
conclusion is strengthened if the three forms of the test 
continue to be used in ^the post.tept role. with dther groups 
of persorvs completing the course and these groups 'also ishow^ 
consistently high leveli^ of mastery of thja test material. 

\ Yef if the -reliability of form A is -calculated Based 
on correlation of the* scores of the s;i^ per^ns with? form B 
" using the data in Table 6, a, very different cone lurs ion would 

• - / . 

BjB reached. ,T|ie correlation coefficient for the two sets of 
scores may be"calcuia4:ed from the .slop^ of the line^itted 
to the six. sets, of paired clata poirits/by the * equation, 

° * * • ' ' >^ y = m^ +. b ^ > ^ - 

The correlation co^efficient can be found by the relationship 

0 / ^ 



fil ^ 

r = m 



?y , • ■ ■ » 



-221- . 



where : ^ 

r = the correlation coefficient 

m ^ the slope of the fitted line 

« ' '^x - the standard deviation^ of the values 

^ ^ array of scores 

= the standard, deviation of the Values 
.in, the y array of scores^ 

Usin^ this relationship and the information ^contained in ' , 
Table 6, the reliability, in the form of the correlation i« 
coefficient between two alternate forms of the tes%, may be 
estimated for the tests, developed- for the short course. When 
the proper values are substituted into *the" above equation, 

. after the slope has been determined and the standard 

\ 

deviations of both sets of test scores have been calculated, 
the following results are obtained. 

*--.314il^^ . — ' 

3.271 

I . .r » 0.180 

The^Bliab'ility estimate fo^ the test fonts is very low. 
Under any usual norm referenced testing procedure tl^e test 
would be judg^ as being very unreliable and would be 
discard^. 

The basic problem is that this method, of estimation of * ^ 
reliability was developed for situations where there ii a« 
very large domain of material t^r which a huge, number of items 
may be written. -'In such dases it is usually of interest to 
' sample- some„it«ns^frpm^w^ to estimate 



,an individual '-s total knowledge of the 1)roarl domain. It is 

-222- 




ERIC 



211 



also of interest to rank individuals in some population, of 
persdns in terms of how much knowledge of> the d'omain each 
individual in the sample has with respect to other persons in 
the population. Under such circumstances persons cVn be 
'expected to vary quite a bit in their scores on the test. 
ThQ correlation of alternate forms ot a test in makinq 
jestimates of individual persons' knowledge domain is the 
matter of interest. The variation in test scores til^eed 
persons in the sample is required for alternate test form 
reliability estimation procedures. Typical achievement 
' tests of a comprehensive nature are good examples in whiph 
such reliability estimation procedures are appropriate. This 
is because the knowledge domain being test^ is very large 
and/ the inclusive sample of persons-used to norm the test 
varies greatly in ability levels.^ Therefore there is much ' 
between person ^fe jance ■ in .the^test scores of individuals, 
even tl;iough witlra highly 'reliable test, glny given person's-' 
tast score will not vary much|^ over .repeated administrations 
.of- the teit. - s- , • 

However, in the sitbations presented in this book it is 
«:lea.r that t^e domain of knowledge OTp skill to be tested is 
usually imich more ci^pq^scribeti. Usually thd question of 
iT>terest is, "How .We>5J/iave the course- participants learned 
to do the specific thitigs the bourse was^ i^te^ded to teach 

them to do pro^^ciently^''^^ The interest "iHlfn^rodQ 

instruction' in areas of complex knowledge and skill which 

-223- ^ 



I '• 



lead to unif_orm high levels pf consistent peBformance across 
persons, and in achieving mastery of a relatively small 
number.pf specific concepts, skills, or procedures which are 
the goals of the pourse.-^ ^ 

In addressing this problem, it -h^frequently been 
suggested th« hett form of relial^'ility estimation for 
criterion referenced tests of a ^mastery level type is the 
simple degree to which tn.e results replicate fcom^one test to 
another and especially from one group of persons who have 
coi|»pleted the training to other similar groups who have 
^Bo completed the training (Maratuza, .1977 , Chapter 12, 
Tyler, 1974) . Of course, one Inust^e sure that the tests 
developed do consistently discriminate among naive and 
expert groups of persons before making the inference that ' 
replication of results of a giv^n post test with nultiple 
^ gpups of trainees after course completion means that leaifning 
has occurred. One can also accomplish this pj^ocedure by 
randomly assigning a group bf-enrollees to a' ^re-test 
administered before instruction and the remaining half to a 
post test administered after instruction. Comparison , of the 
scores- of these two groups reveals if the test is functioning 
appropriately without confounding the interpretation with 
repeated measurement of the saute individuals with similar; 
test forms (Maratuza, 1977). Without the base li;ie d^ta on 
-tite-test-^OTnSfrv^~fFou^^ does 'hot know whether the 
performance on the post test reflects learning resulting from 



I 



-224- 



ERIC 



213 



"the course dr learning acquired before the course. This 
•is another reason for the use of parallel comprehensive pr^- • 
and post tests in the development and evaluation of short 
courses, as was suggested in earlier chapters. These tests 
•are also aaeful for making estimates of learning outcomes 
fair individuals resulting from siich courses, and these 
measurements may be reported as estimates of iridividuals' 
lararning. - ^ ' 

T est Re-Test Method * ' " 

Another common method of estimating the reliability of 
a test is the test-retest Method, This involves administration 
of the same test form twice to the same group of persons 
with some peri,od 'of time be.tween administraUons of the two 
forms' to prevent memory from playing a key role in th6 
production of the second set of responses. Aga'in; the- scores 
from the two administrations are correlated across persons , 
and the correlation caeffiqient obtained is used as an 
estimate of the reliability of the .test. The same problems 
described earlier in ^relation to th* alternate forms 
procedurfe as a method of reliability estimation for criterion 
referenced^ tests apply here as well. If persons have 
mastered a set of particular skills, concepts, and-procedures, 
there will be^very little variation in scores from one tesf 
"adnrtiristtratlon to" another 4nd from lone ptftson to another. 
The correlation coefficient will not beaa good estimate of the 
test reliability. ^^ere are also practical proklems invoked 

-225- 



in aaministerin, the same tist twice for a short course where 
the total number of Items and the scope of the content* ' 
covered in th^. course may m^Je Improved performance on the 
second administration,. because of practice and memory factors, 
more likely than on a test consist^, of many items sampled 
from a very large and general domain of knowledg* and skill. 

A modification which is appropriate to reUability 
estimation for criterion referencej tests does exist (MiUman, 
1974 K under this procedure one develops a large pobl of 
items for the area io be tested, with attention to having ■ 
parallel fojrms 6« items. Next, two test f orms ^re produced- 
by random assignment of the items, or item pairs to test,, forms! 
These items are then i„t4rmingled into one test and the' test 
is adminlsWed to the persons who have completed the short 
course or to some other groip.such as an expert or naive " ■ 
9roup. The test is then scored. The score for each person 
on form A^and form B Is determined. A chart ot graph is then 
prepared which lists the absolute difference In scores of . 
each person on both forms of the test. Inspectlonlj the 
results .reveals how paVallel the two forms of the Lts are 
and how reliably they measure the domain of specific know- 
ledge and skill whteh, is of interest. An -additional • ' 
procedure' is to calculate ti,e mean, value of the absolute 
difference sc^re across persona. This v»lue serves ar ah ■ 
index Of the degrei Of consistency toVlch the two test forms*' 
measure persons' competencies in the area t^ateS. The closer' 

-226- . . ; ' " • 



the mfean^dif f erence value is to zero, the more consistent . 
are the tests. This method is not affected by a lack of. 
variability among persons on the test scores because of 
achievement of mastery. It is, however, affected by' the ' . 
difficulty levels of the items. Therefore, inform^ation about 
the test difficulty level must be considered in making a 
judgmentoof the reliability by this methody It is also clear 
that the most aippropriate use of this method is with a group - 
of experts who. are known to have mastered the knowledge, 
skills, and procedures of a particular short course, or 
per3ons who have been taught to master this body *'of material 
by having completeS^ the course successfully. The ^reliability 
of the test is of interest in relation to the test's ability 
to consis.tently discriminate between persons who have> mastered 
and are expert in the eContent of the course' versqs those^ - 
persons who have not and are^ naive, ' . * ^ t 

The similarity of this procedure to 'the suggestions 
made, in earlier chapters for the development and^use of pre- 
teets, embedded tests /'post te«ts, and delayed post tests 
should be obvious. In^all cases the matter of primary 
interest is to determine if course objectives have Ibeqn 
achieved by students and to 'insure that tests are developed 

. ■ 1 ' ■ • ' . ^ V ^ 

which are capable of answering tliis cfuestiort, --l* is of-^ 
^ , 

little iaterest-4:a-.-rank order -persons in terms ot-the-deg^r^ee 
to which they comprehend some complex artJJ^ inclusive field 
of general knowledge. It is pf great interest to determ'ine . 

-227- 



if a given course is resultdhiq in students learning specific 
skills andV procedures to mastery levels. 

Sjflbd ivided T^s t Me thod 

Another traditional method of estimating reliability of 
a test^is subdiyide the t^st intb two parts. This is often 
done by considering all Qf. the odd numbered items to consist , 
of one form and the ever! numbered items another form. The 
test is then administered , to a group of. persons. Each 
person's tes^ is .scored twice, once on the odd items and 
once on the even items. The pairs of scores for individuals 
are correlated 'and the correlation coefficient is takea as 
an estimate of the reliability of the test. The procedure is., 
simflar to the alternate form and te st-i|e test- methods . The 
same limitations for critetjion referenced tests exist for this 
procedure as in the other two cases.. The same methods for 
overcoming these problems , as described by Millman 11974). ana 
others, also apply. 

Internal Consistency Me thods • ' - 

Perhaps the most common method of estimation of the 
■reliability of a norm referenced test is the Kuder-Richardson 
formula 20 method, commonly referr^ to as thp KR 20. The 
basic ratiohale for this methdtf is that if , the test is 
internally consistent, all items should tend to measure the 
samfe thing and should coirrelate highly with one another. If 
all the items in a test correlate hiqhly with one another. 



the "inference is that ^the test^ivoulJ likeiy correlate^ hi<jh . 
with an alternate form which dbus not exi*&t but which wouln 
be composed of similar items. . / 

There are two common' forms, of the KR 20 formula. The 
'tirst' form is used to calculate- the reliability of a test 
whore (each item is scored as being wtro^g or right. The 

. • • ^ I 

scbring for each item must be dichotomous, as is usually the 
case with multiple' choice tests where every item is scored 
as correct or incorrect. This formula is: 




where: r « reliability estimate for the test 

6 

1 • n =« number of items on the test 



2 ' ' 

s^- observed squared standard deviation 6r variance 
^ of the tbt*l test scores across persons 

p « proportion of persons passing a given item 

q = proportion of persons not passing a given item 

Another version of the formula 'is available .for test,;^ 

items where partial credit may be awarded .rather thaln a 0 or 

1. score only. For example, if one were to administer a test 

consisting of 10 problems where each completed problem is 

scaled from 0 "to 10 points in terms of completeness and 

acciiracy, one wo%d use the Sfecond form of the KR -20 • Here 

the formula is: ^ 




^ where: r = reliability estimate of the test 



n = hrfmber of items on the test 



2 ' ' 

s = observed sqUaped standard deviation or 

variance of the t6tal test scores across persons 



ERLC 



3j= Observed ,square<^ standard deviation or variance 
of each individual item score across persons 

/ 

It IS obvious .from inspection of the two formulas that 
• the problems noted earlier aboub the use of methods of 
estimation of reliability of norm referenced tests for 

referenced tests also apply here. The reason is 
that qnce again,, the procedure depends upon haVing a large 
variation in- performance across individuals on test items. 
The ilmost certain expectation is that the variation will .be 
too low in Cases of criterion referenced tests, under 
conditions of mastery learning. Consequently, the KR 20 
method is inappropriate-, for estimating the reliability'^of a 
criterion referenced test under mastery leanning conditions. 
As in the other situations, there exist methods by which 
•rto' modify the KR 20 procedure for use with criterion'' 
J^®fe^enced tests of mastery level -(Ham^leton & Novick, 1973; 
Livingston, 1972, 1973)". One way is to deviate each persoji's 
V score from am. arb:^aarily determinisd mastery level rather 
than from the group mean score. This deviation can be used 



-230- 



/ 

0 




instead of the usual variance values in thd KR^20. However, 
this method still requires a sufficient amount of variation^^ 
across persons on tesl^ items and in their total test-sco'£!> 
The reliability estimates of test results simil-ar to those 
listed in Table 6, by this method would still yield low values. 
The best method of reliability determination for criterion 
referenced tests is consistent replication" of results across 
repeated testings of similar naive and expert groups^ as ha 
been de^ribed earlier. ^ 

Conc lusion 

Attempts should be made, to determine the reliability 'of 
tests used in continuing engineering education courses even 
when these tests are of the criterion reference^ type and 
mastery learning is the e:^pectation . This will usually be 
the case in most short courses. An exception is the case of 
the short course concerned with remediation of persons ' know- 
ledge and skill in a'^very" broad domain, as is the case in - 
courees which preiwre engineers to take licensing or/ 
cert^ication examinations. Here there is a large and broad 
domain of knowledge and skill involved.. There are norms 
determined by the test scores of all certified professional 
engineers on these examinations'. In this case,,^it Is of 
interest to rank order persons and to determine som^ing of 
an individuals knowledge of the broad domain with respect to- 
other engineers in his or her specialty area. In this* 



// 



ERIC 



situation, "the tests which are developed for assessntent of , 
the learning outcomes .of remediation gr preparatory courses 
are most appropriately constructed, from items sampled from 
across the veny larae domain of knowledge and skill under 
^consideration. The pre-tests, which. may be used to advise 
students whether or not they need to take a remediation 
course, and the post tests, which measure , the degree of ^ 
.learning resulti-og from completion of the course, are most 
appropriately shorter but paraltel to" the achievement tests 
used by the professional engineering societies for licensure 
purposes. In this case, all of the traditional norm 
_ referenced procedures for the .calculation of item 

difficulties; 'discrimination indices., and test* reliability 
estimates are very useful and highly appropriate. ' 

There mav be. other times when the goals of a particular 
short course are more concerned with changing and improving 
some broad area of knowledge rather than producing highly 
specific skill and knowledge outcomes. In- these cases, the^ 

courses mav be m<^re like the typical remediation or 

■> 

preparatory course and it may be appropriate to use norm 

y ■ ■ \ ■ , 

referenced testing procedures by v;hich to develop, validate, 
and determine the reliability of the tests. However, most 
short courses in eng?.neering f^r continuing- education^ purposes 
will by necessity be directed-^toward a much more defined set 
j of specific intended performance capabilities', which should 
/ be achieved to high levels of mastery following instruction. 

-232- / ' 



251 



c 



Therefore, the criterion referenced .test approach to matters 
of determination of itei/. difficulties and discrimination 
indices and to estimation of test reliabilitv ^^ill generally 
be necessary. ' 

Thyrocedures outlined here fo£>he modification of 
the usual norm referenced testing approach- to meet the ne^s 
of crit^erion referenced and mastery learning approaches are 
very elementary. For more information on how to accomplish 
well designed reliability studies the reader is advised to 
r^fer to Applyi^g^No.rm-_RejLe_rej^^ 

M^r_ement_ln_Ed^^^ by Maratuza (1977) . This book is 

particularly useful since it .presents both the traditional 

• procedures and detailed explanations of how these procedures 
need^to be modified for ±he case of criterion. referenced 
testing and mastery learning, expectations. Another Source, 
which is widely used as "a guide ' to the carrying out of item 
analysis and test validity and' reliability studies, ii 
Educational Measurement and E^Aaluation by Nunnally (1972), 
This latter book makes no. reference to the criterion 
referenc^sd testing, situation but provides all the basic 
infoinnation and procedures for the trafitional .norm ; 
referenced situation. 

Once item analysis and test reliability studies have 
been completed /-it is usually necessar.y to modify some test 
Items and to revise w^ole tests iri order to produce bei^ter 
•measuring instruments. The' modifications and adjustments 

* ■ / ' -233- 



#1 • ^ 

necessary shou*^ be made with reference to the matrices and 
plans describe in Chapter 10, which produced the original 
items. The data, from formal item analysis and reliability 
stud ies^is not by itself sufficient- to the task of rewriting 
f^and Revising individual interns and tests. It is only useful 
in pointing out difficulties and problems with particular 
items and test forms which might not otherwise be notix^ed. " 
The actual business of rewriting a given test item to better 
discriminate between naive and expert persons, or the actual, 
^modification of a test for the same purposes, is basically, 
a logical! task 'dependent upon the content of the course of 
instruction, the intended performance outcomes, and the 
matrix of these as they define the purpose and intent of 
. instruction. The procedures outlined in Chapter 10 are the 

* 

basis for not only the initial develbpment of test items an.3 
tests, but also for the revision of items and testis to 
better serve their intenc^'ed discriminative functions. Without 
attention to the procedures outlined in T^hapter 10, the item 
analysis and test reliability procedures described " in this • 
chapter have little utility or meaning. 

^^Even well designed tests have inherent limitatibns as 
estimates of learning outcomes. These limitations and their 
implications, for the proper use of tests are the topics of 
the next chapter. " , . 

r ■ .. G 



-234- 



Chapter 12 

V 

LIMITATIONS OP TESTS 

r 

* The limitations noted in this chapter apply to t'^aditional 

norm referenced testing procedures in particular. Criterion • 
referenced Resting procedures, similar to those advocated in 
earlier Chapters, are less ,sub ject ' to these limitations. 
Hovfever, there are limitations for any procedure which uses' 

. test results to make inferences about the degree of an * 
individual student's learning resulting from a course of 
instruction. ' ' ' ' 

•^Sources £f_lnvaUdi ty and Un r eliability » • > 

No test, no'matter how well constructed, • is perfectly 
valid or reliable, ^is means there is always some questibn'^ 
about Hha't the test measures inj relation to the domain of 
performance and knowledge ^t is supposed to measure. There 
is also another question about the consistency , with which the 
-test measures whatever it is that is being .measured. Because 
ji£ these 'questions ^^test scorep have elements of error in 
their estimates of the performance capabilities of persons. 
Even if a population of persons could be found in which every 
person had a' precise and unchanging amount of knowledge and 
skill in the performance doAain being teste% repeated 
administration of even a very good test to the same group 
would produce variation in test .scores of these persons. 

' . -235- j 

■ o ■ * • 

254 ' • 



' This variation is because: 1) ther'e^are always multiple 
ways in wh.ich a given test item may be interpreted by the 
persons completing the task; 2) the way persons respond to a 
given test ijiem or ^ask are sub'ject^to influences- such as^ 
time of day, degree of hunger, thel^^isence or absence of 
personal concerns and worries, and many more uncontrolled ' ° 
variables which ciuse performance to vary; and 3) ther^ are 
.often variations in the way test item responses are scored 
or judged "correct" or "incorrect".. \ 

With objectively scored tests, variation in scoring or 
judging the degree of "correctness" of persons' eesponJ^s is 
^ removed. Objectively scored tests consist of testa which ' 
, have a Scoring key by which to determine unambiguously if .a 
^ given response to a given item" is right or wrong. One example 
- ^ of objectively scored tests is the multiple choice test format 
which is widely used because of this property. ' However, even 
an objectively scored test is not truly objective. There are 
still the usual ambiguities and idiosyncratic variations in 
_ ^ each item which cause the test to be less than perfectly 
valid or reliabl;^no .matter how objective the S^:oring 
procedure. ' , - ' / ' 

Even when test items are problems of an engineering 
natv^^e requiring the .use of particular course concepts and 
skills to set up and work a. problem to determine the 

r; , specif ications of a particular piece of equipment or obtain " 
certain numerical' values, there "is a'great variation in thq 
scoriijg of the results. In a st<i(5ly involving 1,071 professors 

-236*) " " . 



of engineering, -mafchematics, and physics, Clyde Work (-1976) 
found huge .Irariatians in the scoring of a set of common 
student resppnses to eight questions on an engineering, static 
and dynamics examination.. The responses were actual 
responses of ;^tudents to a real examination. The persons' " 
who scored the examination were a national sample of 
professors, from universities and " technical colleges, who 
actually 't;aught such aourses. Each Jtem- was corrected on a 
0 to 10 point scale where points were deducted for errors up 
to a maximu^ of 10 for any given iteA. ^he results show.ed'^ 
.that for most items, the scoring variations ranged from no 
points deducted to all 10 points deducted! For most other 
items the spread in points assigned to individual items was 
also very large, in most cases exceeding si* points. : It • 
should b« recalled th^t what was being scored by thesi 
professors^ were the same answers to the same problems by the 
saihe stu(^ents. The variation in scoring' was caused by the 
interpretations of individual professors.' Some took off ' 
points for lack of neatness. /so«. did not. others gave 
variable ajmounts of partial credit for colrec'tly setting up 
a problem while others did not. The net effect of all ,of 
these individualsjudgments"^ professors about the accuracy 
of the student's response" i^ a huge yariatio^ in the ■ 
performance score assigned the student. On any given 
problem. the same student response is likely to be g««ded by 

! -237- i 



*^ O 



different professors all the way from completely vrong to 
completely correct with all intermediate scores being well 
' represented (Work, 1976) . 

Other reasons for validity and relii^lity problems of 
tests have to do with the abbreviated and in^nplete sample 
of performance which is encapsulated in each tek ite^n. Test 
tasks must always be abbreviated, shortened, or otherwise 
reduced in complexity and time demands in or^r to allow their 
completion in a reasonable length of ti»e. This. means that * 
test tasks are artificial samples of the domain of performance 
which is' of interest. The manner In which the tasks are . ^ 
sampled from the actual domain of performance and -the -manaer 
in which they are abbreviated directly e^ect the walidUy 
of ^ the test tasks. Even when test tasKs are abbreviated ' in 
the best manner possible, there ««main problems in inferring 
from the test t.sks the ability of the perion in real world 
performance tasks which are typically more complex, carried 
out over a longer period of time," and typically accomplished 
with the help and -support of other persoa« in a cooperative 
manntr. for example Cooperation pn a test tasV la considered 
cheating and. is usually punished. Yet coopeHation in carrying ' 
out the solution to -.complex problems in real iork activities 
of engineers is highly valued and I encouraged. 

* « i * 

For all of these reasons', making inferences about the " 
actual capabilities of persons in areas 6f complex skill and ^ 

-238^ 



t f O rr 



performance on the basis of a test score is risky* Test 
scores are more usefuirfpr providing information about the 

4 

degree to which" basic information, concepts, relationships, 
and procedures have been learned «nd understood than for 
making predictions abotft who will actually perform well on 
the jo^ in dealing with a complex set of tasks, EvSTn well 
designed standardized tests, ^such as are frequently used in 
admissions procedures in colleges ,^ as well as the standardized 
achievement tests used to certify persons competent in an 
' academic discipline, do not predict adult ' achievement in 
actual work settings (McClelland, 1973; Stice, 1979). These 
types of test scores do predict academic succesi of students 
in formal college and university programs, but only to a 
>mall degree. Standardized achievement tests, ^ even when 
^ matched directly to the content of academic programs/ at 
best account for only 25 percent' of the variance observed^ in 
student abhievement (Lavin, 1965). The other 75 percent of 
- the variance in observed • student achievement in academic 
courses and programs is attributable to d|fferences among 
students in interests, motivation, persistence, experience, 
^and opportunity. 

Scores from tests developed for courses and grojap 
statistics are nfore accurate predictors of -the fitfccess of 
a given course In teaching certain concepts, information, 
and procedures to persons on a regular basis than they are 
as precise measures of an individual's learning. This is 
• -239- 

r 

ERIC • - ^'^^ 



because th.fe effectiveness *of a course can be determined by 
using tlie test data ^ across many individuals who have 
^participated in the course ancj complet.e.d its tests, ^ The 
average data is much more stable than any individual test^ 
score ^because idiosyncratic responses of persons to test 
Items tend to balance one another out. The standard error of 
the mea.n score of a* group of persons who have been tested is 
always much smaller than the standard error of the estimate 
of fin individiaal ' s test score, 

Th^Sta ndard Error of Estimate ^ 

The standard error of an individual's test score is an 
estimate of the 'deviation of a person-'s particular score from 
that person's *true score on a giveh administration of a test 
among many repeated administrations. Of course, it 'is not 
possiJDle to repeatedly test one person for as many as 15"to 
20^ times on a 'given test. It is also not possible to assjuirte 
that a person would not change in his- or her knowledge over 
repeated ^^JI^nistrations of th6 same test* However / in an . 
imaginary situation, if a person were tested repeatedly on 
one test and the standard deviatioi) of his or her score were 
calculated,^ a' standard ' error of the estimate for the "score 
of that person would have been calculated. The person's 
true score would be e^/timated by the mean score of the 
rep^^ted administrations. ' , 

-240- 



In practice the standa^rd ertor of 'estimate of a test 
scorp is calculated in dther simple ways. One common • 
procedure for the calculation 4.3 listed below. 

t/measure 

Where (T measure = standard error of measurement 'of 'the test* 

^ . . • CTt = Standard^ deviation of the 'test for the 

sample df persons completing it 

• - ^^^^ r I. ' • 

t. the reliability of the test obtained for 
the sample of persons completing it 

Tfie standard error of measurement for a test provides . 

information about a confidence interval or ba'nd around each 

person's observed test score. The .greater t^e reliability 

of the test, the smaller -the confidence interval "and the 

more accurately a person-' s tSbseryed score predicts the person* 

/true, score. The standard .error of a test score allbws the 

assignment of odds to the liklihood that any person's 

observed score 4s acttally different from some other person's 

score versus being different from' each other only . due to 

chance factors resulting from the ^ess than' perfect 

Reliability of the tes^. Perhaps an example widl help. ' 

Suppose a 20 item test is, developed for a continuing 

engineering education course in soil .mechanics .' The 

reliability of the test is calculated by the jKR 20 alpha 

k • ■ - , ■ . 

coefficient method to'be .74 based upon a saaple of 2,6 persons 
who tobi the test. The raw score test' jnean, and standard • ' 
deviation for this sample, are i3.08 and^ 3.77, .respectively. 




2C0 ■ - 



The standard error of raeasujremen't for this test would be 
calculated W substitution of the correct values ir^to the 

equation lis.ted above. The results would be: 

J- ' I — -—^ 

(J measurement = 3.77V l - 0.74 

> 'j" m =1.92 raw score units 

. If we wished to be correct 95 percent of -the time in- 
classifying persons as' having, scored differently from one 



another or from some arbitrary leveX^of performance, we Vould 
Mve to multiply 1.92 times 1.96 to obtain, a -confidence Land 
around an individual's observed icore. Suppose Mr. Perkins, 
a student in tne course, obtained a score of 12 on the ttst. 
Multiplying the standard error of the test times the value • 
of 1.96, the number of standard deviations both sides of 
the_ mean on the unit normal curve inclusive of the 95 
percent confidence interval, a value of' 3.76 is obtained. 



Thi^ means Mi;.' PerkinsJ true score on th^" test. will fall - 
•withip 12 + 3.76 raw -score units on this tesf 95 percent of 
the. time over repeated admifiistrations of the test to Mr. 
Perkins at this pbii|t in his ability with respect to "cpurse. 
^knowledge and skills. -Suppose the course instructor had 
set a score of IS^raw score points as the maximum mastery *: 
level for which successful completion of the' course' would "be 
recognized. Could ?4r . Perkin's have performed adequately: 
and the difference between his observed score and I the criterion 

score be due to error of measure ro ent - and' unr elia bility \>l Li it i 

test? The ^answer isi, "yes". At his present ability ' level , 



Mr.- Perkins' true score on the test could be expected to fall 



^ wi^in the observed valu^ of , from R.24 to 15.76 oYi repeated 
administrations of the te-st 9 5 percent of the time. 

Stal^ility of Group Mean Scote,s 

-, 

The values Presented for the reliability of the 20 item 
test and the standard error of' measurement for the test are 
rather typical for- such tests. The example' makes it clear 
(^hat even with good tests, error of measurement of an 
individual's ability is larc^e. However, the error contained 
in the test scor.e estimate of th^ average ability of persons 
completing the course is much sn\aller than bhe standard 
error of estimate of an individual's test score". ,1r-oup' 
means are' very stable and the standard error of the piean, or 
average score of a given group of students, id very smkll. 



— ThJLs makes it possible to accurately estimate the 

effectiveness of a course in achieving its intended learning 
^ outcomes based on the average performance of" participants 
'•on post or delayed post tests. If the tests have been ^ 
carefully developed following the procedur/es suggested in 
earlier chapters, this estimate of course -effectiveness will 
be quite valid .as well as accurate and consistent. The 
estimate is simply how typically effective the course is in 
achieving the specific knowledge and skill'' outcomes it posits 
.as its specific performance objectives. The estimate is not 



about the ultimate value or effectiveness of the course in 

" 24 3- 



ERIC 



9 ^ \J ts^ 



real world performance of the engineer in his or her work 
. setting.' It is a well known fact that course grades and 
test scores do not predict adult achievement in real world 
work settings (Hoyt; 1965; McClelland, 1973; Stice, 1979). 
What they do report is;how well specific course content and 
skills have been learned and, to a limited extent, how well 
the student is able to lear;!" more related course content and 
Skills in future courses. Using both past'course grades and 
well designed standardized achievement tests as predictors 
of future academic* performance, the. maximum amount* of variance 
which can be accounted for in actual student performance is 
approximately 25 percent (Lavin^ 1965) . One should always 
Heep these limitations inomind when the result* of continuing 
Engineering education students' performances are being 
reported, to them and their empl-oyers. 

In summary, estimating an individual's knowledge and 
skill acquisition following 'completion of a course and based 
on a single test. score is a much less accurate procedure than 
es(iajjnati6g the average effectiveness of, a course in achieving " 
its Kneiiledge and skill objectives based on many students* : 
sco«^s on one test; Test scores pooled across persons can be 
valid indicators of course effectiveness. Test scores alone 
^^^"^f^ivi^ual students are less valid indicators of the 

^^^^^ °^ learning outcomes, in short, tests ar^ 
more useful for evaluating courses than' for evaluating persons. 

■' 

-244- 



/ 



/ 



In making decisions and inferences about the amount of 
learning which has been achieved by individuals, multiple and- 
independent test scores are more ^seful than a single test 
score. However, the Inference about degree of learning should 
not be based only upon test scores, even if multiple te^ts 
are used, what is needed in addition are observations of 
students' performances on complex' tasks and careful examination 
of the products produced by students in the execution of 
complex performance tasks which have been assigned". As 
/mentioned in earl ier -chapters , these products pfoperformance 
include designs, analyses, reports', solutions to complex 
problems, plans, and other products normal to the work 
performance^ of the engineer and upon which the content and 
skills of a particular course can be brought to bear. 

Other sources of information about the degree of learning 
of course content and skills should not be overlooked.. These 
include ^asking the course participant how much. he qr s^e has' 
learned, -how useful the learning was in ^relation to improving" 
work performance in particular areas, and how often course 
content and skills are actually being usejd on the job. 
Supervisors and employers should also be asked to make similar 
judgments. If- all of this ,inf ormatdon" is collected, it is 
mUch more appropriate to make an inference about the degree 
of learning of an individual than it Is to do so based only 
dfPa single fe^st score or even multiple test scores. 



ERIC 




2^. 



o 

ERIC 



Advantages of Criteri onjRe ferenced Tests ' 

Although many of the limitations of tests also apply' 
to criterion referenced tests, the restrictions are not as 
. great. There 'are a number of reasons for this. 

Traditional norm referenced tests tend to be global and 
not well defined in terms of the particular knowledge, skill, 
-a;nd performar>^ capabilities they are testing. Consequently, 
th^e items selected for inclusion on the test are only a few ' 
sampled from a huge domain of .potential items. This is the 
normal situation in standardrzed achievement tests. In 
criterion referenced tests the focus of the testing is on 
very specific and well .defined knowledges, skills, or 
performances not in comparing the performance of individual ' 
persons on the test against other persons in 'a sampU of 
similar per^s. The student's performance of the specific 
tasks prese^nted in* thfe test is of interest because these 
^ta^s are the intended goals of some preceding instructional 
activity which is specifically desi<Tned to teach that " 



- performance. ' 

The criterion referenced testing situation is more • 
tightly controlled' than "is the typical norm referenced testing 
situatio^. On a criterion referenced test the student 
deny^nstrates^hat he or she can do on some specific task ' 
which has been taught. Typically the student will have to 
recall knowledge and information learned, mkke judgments 
about how to, proceed, and apply skills learned -to complete the 

-246- 



/ 



/ 



2C5 



/ 

/ 



task. After the task is completed 'by the strident the course 
instructor can determine how v;ell it was completed. The 
product of the performance can be 'compared with the product 
of the performance of a/ skilled professionaf for the same ' 
task. T,he scoring of the accuracy and the 'completeness of 
the student response .can be quite objective with respect to,- - 
some standard critc^rion of acceptable performance. 

In a typical norm referenced testing situation there 
is no comparifble grounding of, the test items or tasks in " 
specific /aspects of skilled performanc4 to a criterion level. 
Thus individual test scores are not directly interpretable in 
terms of what the student can or cannot do in the way of 
ejcecuting a eomple:? performance with skill. All that can be 
said is how well the student h^s performed on a set(of test 

tasks in comparison to a group of peers. The judgment of 

» / 

the student's 'performance capability is relative and general. 
In the criterion refer^ncifed case, the judgment of the 
student's performance is absolute and specific to wejll 'defined 
skills or JapabilitiGs . . * . ^ * , 

If properly 'd'eveloped and adnjinistered , criterion 
referenced tests have an inherent validity and reliability 
wh^'ich makes them superior po norm' referenced tests (Tyler, 
1974) particularly in the content of short courses and other 
learning expediences typical of many continuing -educa-tion 
courses where what is to be insti ucted and what is to be 
achieved by students are very well defined and highly specific 



/ 



• -247- ■ 1 

I 



There can still be probl'ems of * agreemfent among 
different raters regarding a stucjent's performance on a 
criterion referenced test^task". Just as Work' '(1976) noted 
wide variation anfong. instructors w|io scored the same student's 
answers to a common problem, instructors scoring the 
performance of a student or^ a criterion te^t item can disagree. 
However, they would not typically be expected to disagree as 
much' ^ because part of developing a criterion referenced test is 
the determination of what will be taken as evidence of correct 
performance. In short, a common, high level of criterion* 
p^formance is defined before testing., and even before 
instruction. Consequently, during a course students are 
specifically instructed and guided toward exhibiting mastery ' 
of the knowledge and skills being studied • It is clear both 
to them and- to the instructor what it is which is to be • > 
accomplished and how well the instructional and test tasks 
raqst be perforpied* There is a common and clearly communicated 
aqre^ent on the . standards of performance which are acceptable. 
These criteria are usually grounded in the standards that 
are judged acceptable in actual practice of expert engineers 
on similar t^asks in actual work settings. ^ 



1 * ' 

The education of professional persons to high* levels of 

mastery in complex skills and abilities is typically 

approached in this manner. For a , detailed logical, 

philosophical, and entpirical exploration of this topic the 

reader is referred to a study by Lacefield (1980) concerning 

the measurement of competence. 

-248- 



Thus it is reasonable to infer ^whether or not a student, 
having completed a course of instruption, and having perfo.rmed 
on a criterion referenced test, has achieved the desired 
level of competence or skill. Criterion referenced 
performance assessment^ is used particularly in situations 
where it is critical that the student actually be competent 
before being allowed to practice. As was mentioned in an 
earlier chapter, these types of test tasks are used with 
physicians before they are allowed to conduct surgery, 
jyith aircraft pii6t^ before thev are allowed to fly planes, 
and with persons who Operate v'ery expensive and complex 
equipment where an error or lack of competence on the. 
individual's part would be a threat to property and life. ^ 
In such cases criterion referenced performance tasks are 
routinely used to conduct assessments of the individual's 
competence. The test tasks are very specific to the , 
performance ^capabilily under consideration. They are. very 
valid and reliable because they involve simulations or 
abbreviated test tasks which demand nearly the game types and' 
quality of performance as does the actual . performance • 

The procedures outlined in Chapter 10, which describe 
how to develop good assessment procedures and test tasks by 
which to measure learning outcomes for continuing education^ 
courses, are directed toward producing criterion referenced 
tests. The procedures in! Chapter 11 explain how traditional 

'-249- ^, 



norm referenced testing procedures for determining item 
difficulties, discrimination indices, and test reliability 
"need to be modified for criterion referenced tests. Use of 
th^ traditional item and test analysis procedures for 
criterion referenced tests is inappropriate. Just as^^the' 
traditional concepts of test reliability do not apply to 
criterion referenced tests, neither does the traditional 
, concept of standard error of measurement of a test score. 

In Chapter 11 it was noted that all of the usual methods 
of estimation of the reliability of a test depend upon' the . 
^ occurrence of a'large amount' of variation among the test 
scores of students ^completing the test. ' If there is- little 
variance among scores the reliability will be low and the 
procedure for computation 66 the reliability will be ' 

invalid. The same problem exists even with methods such 
as Livingston's (1972)' which' is designed to modify ^ traditional 
procedures of internal consistency reliability estimation for 
criterion referenced testing. Ther^ if ten is simplir too * 
little variation in sti^dent scores upon completion of the 
course for these procedures to work well. * 
When .course objectives' have been cleajrly laid out, 
when instructoirs^^ork to insure that the student has mastered 
the couf^e content,, and when neafrly all students upon 
completing the course have mastered the material as judged by 
their test performance, then it is clear much leaning has 
occurred. Many studies show fehat instruction- of 'this type 

-250^ 



.. . 

Is superior to traditional instruction in 'therms of the 

amount students learn 7 hov^ lonq they retain it, and how 

well they can apply it 'in the next course (Kulik & Kulik, 

1976; Kulik, et al., 1979). Yet if^ one were to compute the 

reliability of the tests used to make detei;^inations of 

the students' achievement in these courses, .using the 

traditional means, the tests would have poor reliability. 

This is because the variance in test scores is reduced ' 

because of uniform high achievement hy students. Yet we' 

know t'h^t the tests , are reliable estimates of students' 

learning because students who have completed such courses 

* - "* 

and scored highly on the examinations for them consistently 

i 

out perform other students taught in traditional courses.. 
The mastery learning students perform better than students 
instructed, in traditional ways on standardized examinations 
of the content are^, on common final comprehensive ' ^ 
examinations, and in achievement in future and more advanced 
courses in'the same content area (Kulik & Kulik, 1976). 
These results -are consistently found oyer many emf^irical 
studies conducted in* engineering and the physical and * 
social sciences (Kulik . et al . , 1979). Clearly the te^sts' 
must be measuring student achievement of leairning outcomes 
'in reliable way6 or these consistent results would not hold. 

As was 'pointed out in Chapter 11, the best indication - 
of the reliability of a criterion referenced test is its 
ability to consistently discriminate between groups bf personV 

. ■ -251- ' ♦ ' ■ ■ 



.270 



llnown to be skilled in the performance of interest ani t*iose 
who are known not to be skilled • RepLicabilitv of results 
with multiple groups of naive and expert persons is the best 
indication bf . test reliability or consistency (Tvler, 1974). 
Tl|e best m"eans to insure this replication of results is t6 
develop the test •tasks and th^ instructional procedures 
accordinq to the steps outlined in Table 3, Chapter 10.^ 

.Likewise, the construct of the standard error of 
measurement does not apply well to the criterion referenced 
situation (Livinqstgn, 1973) . Other procedures more ' 
appropriate to estimatinq true mastorv scores have been 
developed. These "methods are qdite different than the 
concept of a person's true score and the standard error of 
estimate of a test score which are used in norm referenced - 
testing (Hambleton & Novick, 1-973). Rather than compute the 
probability of a person's observed score falUng within soijie 
range ,aroun<f his or h# true score, in the criterion 
referenced tesl|:ing situations one is" interested in asberting 
whether the persons can or cannot perform ther task correctly. 
The probability -ia .0 to' 1 . . The judgment i's accurate only 
insofar ijk the test task has been properly grounded in the 
]^erformaj||^domain of interest as outlined iji Ch?ipter 10. 

Even criterion referenced tests must necessarily consist 
of abbreviated performance tasks. Flying a flight simulator 
on instruments is not really the same as flying a real 
commercial airclraft in bad weather with a full |load of I 



-252- 



271 . 



passengers. Desiqninq an a.itonation procedure for a simulated 
industrial production process and testina the procedure with ' 
a demonstration, microprocessor kit and a comn,it^r program 
is not the same as actually, developing and installing an 

t * 

automated procedure in a real factory. EVen the best 
simulations, abbreviated test tasks, and practice ' ^ 
examojiations must always be only initial and pa^rtial 
assessments of the individual's probable competence in the' 

« 

actual work domain with its many uncontrolled variables and' 
greater comolexitvr Yet, the limitations o^= test tasks, 
because they are abbreviations of actual work situations, are 
less for criterion referenced tests tha*n for norm reference 1 
/tests. This is because: there is an explicit attempt to tie 

criterion referenced te^s directly t;o the performance 
^required in the actual-work (^main, and because the ' 
capabilities being tested are mbch more specific and defineM 
than in the norm referenced situation. 

Even with criterion referenced testing procedures it is 
possible to make even stronger inferences about the success 
of groups of persons achieving the desired learning otitcbfnes . 
, following a course of instruction than is the case for 
individuals. Demonstration of consistently high levels of ' 
complex • performance bv many participants following instruction 
provides convincing evidence that the-courae is effective in 
achlieving its intended outcomes. 



' The next section bf this book deals with reporting . 

measurements of learning outcomes of courses to individuals 

and groups. Before beginning this .topic, a summary of th% 

- .J 
mam /points made in this and the, previous chapters on. 

testing, is presented.. ^ ' - , 

Tests can never measure real performance in true to life 
situations. They nmst always consist of abbreviated tasks, 
a^d samples of activities designe^^ to assess^knowledge and 
'skills believed to be basic jto effective practice in some 
area. Properly designed tests can -reveal much about how* well 
a course is fostering its intended learning outcomes. Tests 
have limited value for making inferences aboub individual 
persons' performance capabilities in actual job situat;ions. ' 
Although these. limitations apply to all tests, they are. 
less applicalja'a/'to "properly designed criterion referenced . 

tests. than tb typical norm referenced tests. Consequently, 

S 

tests designed accor^^ing .to the procedures outlined in 
Chapt^ 10 and elsewhere (e.g. , Maratuza, 1977, Chapter 17) 
pro/ide much better estii^ates of individual achievement of 
intended learning outcomes than do other type^ of testing.' 
They also provide better estimates of the effectiveness of 



the course and its instructional' procedures. 



/ 



— By st ating c ourse, ob:)ectives In performanc/ terms ^'nd ~ 

by logical sampling of teat items and tasks within the domain 

-254- 



2? 'J 



c 



of performance of interest, it is nossibl^.to constiuot' tests 
which have reasonable validitv. External cliecks- on the " 
validity of ,tests, involving the use of experts to examine 
and critque the content of the test and 'to "actually complet^ 
.the test, help to further validate a test. ^ Administration of 
the test to naive and expert groups will a>lso help, develop 

' measures capable of distinguishi-ng between the presence and ' 
. absence of the knowledge and skill areas of interes.t. . 
Item analysis and te^ reliabil ity studies mav be . 
carried out once sufficient data is collated ^ from 'actual 
test administrations. These procedures can help, refine a test 
and achieve a proper balance of easy and difficult test items. 
Adjustments can also be made in t}?e ability g^f ,-te«t items to 

, discriminate between persons skilled in^ the' content and know- 
le^e of the course and those who are not skilled or have 
leds- skill, ReUability estimates may be obtained in several 
ways. These estimates are useful for detlrmining "to v;hat ' 
degree a test is consistent in measuring aspects of. a know- 
Jedge, skill, and^ performance domain acE:os;i individuals' and * 
groups . * , ' 

It is a time consuming task to construct good" testk. 
Even the best, tests have less than perfect vaTlidity and ^ 
reliability. There are many reas^s for^^iiis^ including- the 
abbreviated hature of test tasks, the artificial work, setting • 



in ithich they are administered, and the us'ual .variations in 

-2S5- " . ' 



human emotion, motivation, attention, and;chanaes in 

interpretation of the content of .test items bv different 

persons and the same persons in different t^t admitiistrations , 

However, if tests have been carefully developed, one caj^ 

make strong inferences ^bout the general effectiveness of a 

course in- teaching" specif ic information, skilLs^ and 

procedures to groups of particinants . This is because the 

inferences about the grouo oerform^nce and the effectiveness 

of the course are' based upon aggregated data across^ 

individuals and result in (Statistical means for which there' 

is r^elatively larae consistency or stability from one 

replication of a course to another ,/'Cr!\other things being 

equal. The inference about any particular individual's 

learning based on a test score is an estimate subject' to 

* 

much more variation. 

Consequently, care must be taken in the use of. test 
scores as the means of making Inferences about the degree of 
^ learriing achieve|i by persons as the result of short courses 
In-^engineering- v^ork performance areas. The bes.t use of test 
scores for individual3*who have aoraplefedd short courses is 
.to recognise ^ them .as rough- 6s timaites of^the knowledge and) 
•skill levels of p^sons.in some specific areas which have been 

$;aught because they are believed to be related^to the complex 

^ * i ^ J 

performance domain-. The most- inappropriate use of such 

scpr.es -to ^ report the iAdividua! qualified or unqualified 

'to perform-' in the actual complex, on-the-job v|orK aria 



especially on the basis of a sinqle test score* . Other 
information based on actual observation of an individual'^ ' 
performance on comolex tasks and evaluations of the products 
of those performances is necessary* to make such inferences. 

Another good use of test scores is to make inferences , 
about the general effectiveness of a given course for 
purposes of formative evaluation and subsequent revisions in 
course content and presentation to better serve the*needs of 
participants" Still another highly appropriate use of . test 
score dataf is the .summat#ve evaluation of the effectiveness 
of a course at any given, time in its history in order t'o ' 
provide prospective clients, course developers , and others 
with good information about typical course effectiveness in 
teaching particular skills, knowledge, and information. 

Questions about how qualified persons are for job 
performance in actual work settings after they have complete'^ 
specific short courses should- always be based' on multiple 
indicators of the individual's performance capability in ' 
realistic work settings with representative .pif-oblems o? tasks. 
About all test scores for short <sourses can reveal in the 
degree to which the individual engirf'eer ' has learned knowledge, 
skills, and procedures thought to be basic to good practice 



in the actual performance area. High scores do not mean that 

n m 



the person ig necessarily competent. Low scores can mean 



that .the person is unlikely to be effective in the 



.performance areas. Flowever, scores lowjfer than arbitrary 
criterion performance levels can also be caused measurement 
error in the test instruments themselves, by inadequate 
instructional procedures, or by a I'ack of readiness of the 
student., to enter into and profit from the instructional 
activities. One should be alert to and attempt to control or 
compensate- for tt^se factors when us'inq^ test results and other 
information abo,ut performance to make decisions about 
individuals'* capabili-ties or the effectiveness of courses 
and programs. 



-258- 



Chapter 13 ^ 

r 

.^REPORTING THF ASSDS^SMI^NT OP LEARNING /oriTCOMKS 

Persons with ' interest in' the learnina outcomes of 
continuina education courses inclu?le the participants 
enrolled in the course, the faculty who have developed and 
taucrht the course , the employers and stipervisors of the 
course enrollees, the administrators and croverhing bodies of 
the academic unit involved in ^peratinq the course, and the 
governing ^bodies o^ professional engineering societies who 
are^ concerned with ^he gualitv of the' course. For thesf 
reasons, there are a number of different purpo.ses and 
methods, for reporting the resulting learning ojubecTmes for 
any continuing education course.' A v;ide rapge of information 
must be collected* to meet the needs of these persons to know ^ 
the effects of a course on enrollees'' performance. All of 
the persons . involved ne^^ ^this information for the purpose of 
decisionmaking (Lacefi^ld, 1980) • • * ' 

^ ,' 'X ■ '' '• ' ■ ■ ' ' 

' . The individual engineer is most concerned Mith 
information about his cfr ,her undersfanding of specific* 
concepts and procedures taught in the course. Information 
about student succeas in the specii^ic* tasks of the course is 
part of the instructional^ process . Both participants and * 
instructors ♦need to. know what individual students have and 

. ^^ ^ . / : • ' ^ • • 

» ... • . - 



have not yet learn.^i fully and specifically, what remains for 
them to learn more about. The Purpose of learninq assessments 
.in this context is to facilitate learning by the individual/ 
The results effect subsequent decisions by the individual . 
'Student concerninq. what oarts of the course to 'study more, 
where to ask for assistance / and what future -hreas of study 
to en^raq^ in or one's own-or by part icinat ion in additional 
course? or other formal study activities.'. . ^ ' • 

; -.^ At the end of the course o^ ins-truction, the .enrolle^s 
also have a strong interest in how well they have performed. 
InformAtion conaernincr .the degree of mastery of course 
content and skills achieved by individuals ih of interest 
to these persons,-^ People also often want to know how they 
perfonned in comparison to others in the group', ' Instructors 
should provide both types of information to individual 



enrollees, ^ ' • 

Instructors' Need s 

Coursje instructors have similar information needs, in 
order to c£U:ry out the instructional activities, and decisions 
required for the teaching of the course. Decisions about 
when individual students and groups of students understand^ 
course concepts and can go on to the next task require some 
sott of assessment. This assessment must be made rapidLv 
and during the course of actual instruction. Embedded test 
tasks -such as those described in ^Chapter 7 are particularly 
'• ^ • -260- • ■ 



useful to this end. Typical ^mbed-led test tasks ^^inclade 
quiz'/es, homework problems, laboratory activities, and other 
proce^lures which require the participants to demonstrate 
their ability to understand and use course concepts and 
skills in specific ways. This type of assessment activity 
is closely tied to the business o'f . instruction. It is 
.iniportant that the information collected from such assessment 
be shared immediately or as -soc/n as' possible with th^ course 
enrollees bv instructors. Instructional decisions concerr|in<T 
pace of activities, provision of individual assistance ^or 
st^udents, an! prescription of remedial study or work for 
individuals or^ grqups have their basis in this ongoing 
assessment of learning. 

Course instructors also need to know-how the students'* 
perform ^t the end of the course in- order to make judgments 

* : 

about the effectiveness of the course for particular 
individuals and groups compared to other .persons and groups. 
The information about end of course achiever^^t is more 



meatiinqful to both instructors and enrollees if*' information 
about the entry level skill of participants is available, from 
pre-tests or some siAnilar assessment procedure.'^ Course t 
instructors can more readily interpret the post te^t 
achievement data resulting from a course if information about 
the individual - enrol lees backgrounds' hag h^en col lecte/l. 
Information about prior course work in prerequisite area's, j 



th^e denree to which, basic, concepts prerec^uisite to the course 
are used in dailv^' work activities, and the prior formaf ' 
education of participants is useful for revealina differential 
^e'ffectiveness of the course for iifferent persons. 

Program Administration Ne eds 

The administrators responsible' for the operation of 
continuing, education courses also have strong interests in 
^11 of this information. In addition- thLy need to know if 
the course was taught effectively in terms of the instructor's 
beHaviiDr,/ interpersonal style, and general competence both 
in the content of the course and in the teaching 'of the 
class. Other 'information about' how the course was advertised, 
how participants hearci about the course, and how adequate^ 
the location, physical setting, and time period' used to 
teach the-course are all importan^ information to-persons who 
opel^ate such programs. 'ConsequentlV information of this 
type n^eds to be collected, tabulated, and used^o make ' 
:decisions about future course 'offerings, instructor 
assigiiment/s, scheduled locations, and optimal durations. The 
perceptions of course participants, the persons who sent 
them to participate in the cours^e, and^accurate records and 
descriptions of the conditions under which courses operate * 
are all, irajxirfant sources of information which need to be . " 
collected, tabulated, and summarized for the purpose of 

• * * J , 

making these types of decisions.' / 

-262- 

" • ; , ' s> - 



2^1 



Cl^ient_Aciepcy J^eeds ' , . 

Informatibn about the operation of a course, the' 
characteristics of its participants, the effectiveness of 
the instruction in achieving intended " learning outcomes , and 
related matters are also of intiest to the ag,6ncies and 
companies who send engineers to participate ih conti..uing 
education courses, as well as to the Prospective' participant: 
in addition, 'the former groups are interested in th^ 
qualifications of the coyrse instructors, th'f adequacy of 
the instructional materials and facilities, and the' " 
committment of the continuing education unit program [to 
continue to work with and, support the learning needs of 
.engineers in specif id\egi^hs and companies. For this reason 
companies seek information about the reputation of the 
continuing education program,, the course,, and. the instructor 
vho teacn^them. The tacit evajtaation of the worth and 
effectivpess of the particular course and the cr Jibility 
Of the -sponsoring institution is important when decisions are 
made whether qr not to involve one's emplpyees in that < 
course; Information about the number of replications of a 
course with different groups" and testimonials from satisfied 
former individuals and their companies often provide^ thq' " 
basifi for auch decisions. 

Pr o f es 3 ion aljgc3cieti, es ' !:Jeed s , 

Professional. engineering societies coriber^ed iith 

awardina CEUs for successful short ' course' participation are 
' • • -263- 



interested in the qualifications of continuing education 
program sponsors, the quality of the cojirses taught:, and 
the qualifications of the instructors who teach tile courses ^ 
(Council on the^Continuing Education Unit,. 1979; Martin 
Gteenfest, 1^80) • Interest in , the specific learning outcomes 
on specific aspects of colirses is not of great concern to 
these groups. Rather, if the clreditabiTitv of the institution 
offering a continuirjg education program is established, the 
assumption is usually made that the instruction is valuable. 
It is expected that the course instructor will mak^ a valid " 
assessment of individual students ' ^ learniriq^ and ^assign some *^ c 
^sort of grade which qualffies or does not qualify the 
individual student for CEU credit (Enell, 1980),. This is a 
common pattern in higher education involving the accreditation 
of institutions which then offer programs and make judgments 
about the clegree of Individual student success in meeting 
objectives f(|r specific courses by whatever means^ 

* ! 

Meeting biverse Information Needs 

1 

' Meeting thefle diverse information needs requires K 
thorough documentatiofi related to planning, conducting, and 
evaluating continuing education, courses. Information about 
the abilities of enrollees to perform the specific tasks 
being taught in a couirse upon entry' to and departure from the 
course is basic. Yet by itself this information is not 
sufficient -to explain why learning did or did not occur. The 



Ufferential effectiveness of courses on the same topic, or 
of the same co^l^se for different persons, (^r of the sime 
courses for similar qroups of persons with different 
instructors and anier different instructional conditions 
cannot be- explained by pre- and post test or other forms of 
learning achievementa^sessments alone. Rather, information 

j|l)Out the conrjition/s under* which instruction and learning 
have occurred, as 'well as about other variables mentioned 
in the earlier sectior^s of this chapter, is needed to interpret 
why tlje observed learning outcomes r^ult. It is essential 
to keep good records concernincr the details of when continuin 

.education courses ar-e offered r who is assigned, to teach 

them; how they are advertised; 'what formats are selected foir 

their, presentation? how many students are enrolled; and how ' 

the content. and instructional materials ,are developed, u 

f ■ . 

selected, and presented.' The forms included in Appendix A 

are d,esigned to collect data of tiis kind and' serve as one*"'^ 

set of examples of how to systematically gather information 

about course characteristics:* ' - . 

The need for large .amounts^f ijpk'formation does not 

-require that every participant in^ve^ry replication of a course 

be asked to complete every survey instrument,' interview 

form, and available iiest. " Rather as was suggested^ in 

earLiei; chapters, it is "expedient to sample persons within 



cl^urses and the employees of thfese persons to collect some 
of this information. After a course has been in operation 

-265- r ' 



i 



for some time^ ani its properties and cViaracteristics have 
been determined and found to be appropriate, it is only 
necessary to assess individual student learning outcomes 
,on th*e course content and to monitor occasionally kather 
aspects of participants, courae instructors, an<| course 
operation, as this information is ^ needed for reporting proqram 
characteristics and ' achievement to various griOups, 

» - * 

5!L?.^A Information :_ Student Ach ievement , Coursed and ^ 
Instructor Characteristics — — 



, The basic means for assessirig and reporting learning 
J outcomes suggested rin this book is to use the results of pre- 
tests, embedded' tests, post tests, and delayed post tests/ 
Among this array )of tests, pre- and post tests are most useful 
for makirjq inferences about the degree af individual student 
learning resulting from a course an<3 also the general level 
of course ef f ectivi&ness across person^. Embedded test ta^ks 
are more ^useful in the ongoing instructional activities of 
a-course for instructional decision making. Delayed post 
tests are most useful for examining- long t6rm course effects 
^upbn- participants' functional kno\|ledge and skill jln applyinq . 
course -content' to problems in a^ work setting. ' ^ 

Procedures for developing thesfe types of tests, and ' 
. means to insure their validity and reliability, are outlined 
in Chapters 6 throuqh It. if data from such tests are * 

collected^ a variety of judgments can be mide about the, s 

r266- ' » ^ 



effectiveness of the course in reaching its intended "learninq 
outcomes. Ii\formation about the progress of individual ' s 

^ learners can be reported to them and to persons they 
designate. Test dat^i also can accumulate' in various ways 
and h^e used to make improvements in the organization, content^ 
and teaching of the course (fornUtive evaluation). Test 
data can be aggregated to provide information about the over- 
all ^f f ect;iveness of the course in" meeting its objectives 

^ (aummative evaluation) . Additional non-test data -fconcern^d 
with participant- characJtreTrtJTrtcp-, instructor Chirac teristfcsT"" 
4nd course operation can be collected with' instruments . • 
similar to thope presented in Appendix A. This descriptive ■ 
data can be tabulated and i^sed to explain the effectiveness 
or lack of effectiveness of instruction, toward improving 
the quality of future replications of specific courses.' 

In summary, pre- and post t^st scores of individuals ' 
enro^lled in a course are the baSid measures of achievement 
by which to judge the degree of learning of individuals and 
general level of course" effectiveness Additional data ' . 
collected on cours^ opirating characteristics and instruciqr- 
competence and beKavior are not measures of learning outcomes, 

" * * * ^ * 

but are necessary to 'understanding the observed learning >■ 
outcomes measured by the pre- and post test iiistruments . Without 

■ 

the second set of non-test data it is not possible to fexpla.in 
the differential ef^ctiveness of, courses for different 
persons,^under different conditions, and with different 

•j2674 ■ ^ . ■' 

t . • ' 



instrucitors. Decisjlons about courses,' their, organization , 

. pacinq, schedulinq,' and staffing all require bbth the basic. 

/ • • . . 

achievement data and thd descriptive data which can be 

.* ■ * ' ■ 

^gathered- by procedures similar to those fiescrlb^d in Appendix 

h. ' , ' ' ' • 

Examples of wavs to report learning outcome^ will now - 
be .presented. -It sho'uld be noted that learning outcomes, 
. based upon pre- and post test or delayed post test scores ' 
are' meaningful onlv to the degree that the tests are reliable 
and valid.. As was pointed, out in Chapter 10 this means 
that the course objectives being measured by the test need,' ' 
to be stated in performance terms; test items need to be 
^ mapped to the full range of . performance objectives; a matrix 
of objectives by test items by topics is needed to insure that 
the test 'is representative of the learning outcomes exp'ected 
to result 'from tha course; the test tasks must be abbreviated 
and time efficient; and that the test tasks si^ould be 
externally validated.. -The reliability of the teet items 
also needs to be established. Assuming all of the hhove 
^procedures have been ca/rried out, the infei^ences about- ^ 
degree of learning resulting from, a course based upon, pre-/- 
. -an* post test scpres of ' individuals'- can be quite strong", 
^eppeciaUy if the results^are aggregated accqss persons. .If 
the results replicate from one teachincj of ^the course to 
-results cilptained with- other' groups, the inference abq,ut ' . 
course^ effectiveness in^ achieving desired learning outcomes - 
-can be very strong, . ' - • . , 

' • -268-- ^' * ' 



In , the examples' used, pre- and post test data fidif-each - 

■ ^ • ,- ■ ■ ^ 

student enrolled, in the course will be ttie bas^c^' information 

use^ to make decisions about the degree of* learning /achieved 

by individuals and on the average by group!. Del^ye<3 -pto'st 

test data is generally hard to collect andj.is usualiV* not 

colle<tted across all individuals. -Rather, -delayed, ^st tests 

or other assessments of performance af t^r the cdurse h^s 

been completed for some time are usually sampled across 

persons and courses for purposes of making-inf erenpes kbout 

the long term effects of courses upon performance. 

(fat hering and Presenting Basic Achievement -B r gtalv ; An Exairiple 

Let us consider a short course of abput three hours ' 
duration. The course, is titled "Urban Stgrm Watei Quality\' 
, Modeling: Renoyal and Impact." The. course was ,one^f three 
short course^bf fered 'during the Sixth Annual Int^rnationa^i 
Sympdsium^on ■tban Storm Runof f , ' held , at the iJriiversity af^ , 
Kentucky in July, i979'. *The' co\S:se*'lncluded tw 'houirs of 
formal instruction followed "by one houlr'^of^ example probld^i «^ 
solving and discussion. The problems o^ evaluatilig courses 
of this shorty duration are somewhat .special because.^ of the * ^ 
limited time available. tYet, the procedures for ^collect^ip^ii^^ 
and reporting learning assessment data are basically the^ 
same as for other course*s. ^ '* - ^* * -^.^ 

This particular j^oujrse was evaluated^ by 'the; Leairning'. 
Outcbmes Measurement Project as one\of''- tTi^^a^tiyities <if the 



group. The .course was taught by Dr .''.Michael Meadows, a 
^ civil engineer in the College of Engineering. The. purpose Of 
' the course was to impart nfew concepts and skills^ in thet topic 
of urban storm water quality modeling." The coufse is an 
example of the third category of course^ described in 
Chapter 2, being concerned with imparting ad\^anced technical 
concepts and sHills to practicing engineers, t'^chnologists , 
and scientists. The course had never been offend 'before.. 
Participants successfully completing "the course wlffe to bfe 
awarded 0^.3 CEU from the College of Eng*ineering, Ui^versity 
of Kentucky.. A totaU of 31 persons, completed the covirse in 
thr.ee different replicat;ions,^ with .eight persons in tWe ' 
first session,- fifteen in the second session) and eight^n 
the fi^inal Bessiton. The participants were typically ciyi'fvr 
engineers 'working^s consultants ot for federal', state, ^or\ " 

local •gpveSrnment agencies.* ' . . . . • \ 

t ' - " ' \ 

^The maj$r intention of the evaluation was to determine . 

the effectiveness of the course, father than to tnake strong 

individual assessments. oF-eaBh student's learning. The ' 

reason for this is that the time availa'ble for instruction 

was very'shott. No more /U:ian a few minutds could be devoted 

'to testing with a maximum of ten minutes allocated for the 

pre-test and another ten for the post test. 

'The course instructor developed twelve te3t items of the 

essay or constructed ^response type. The twelve items 'were 



sorted into three categories: easy items, morlerately 
difficult items, and difficult items. Test items not meetinq 
• these -criteria were rewritten in order than an equai number 
of items in each category was obtained for each test form. 

One test item from each, cateqory was randomly assigfted 
to an individual form of the test. Four forms -were constructed 
for use as pre-tests and post tests. The actual test 
questions and their assignment to the -four different' Sorms 
of the test are shown in Table 7, 

An item sampling procedure was carried out to insure an . 

adequate assessment, o,f_jthe^earning-^oiitcQmea o f i- Hp r^nnr<^^. :_ 

This is consistent with procedures designee! to estimatfe the^ • 
effects of courses on achievement of persons generally (Lord 
& Noyick, 1968; Shoemaker, 1073). However, the performance'- 
of each individual on these 'tests is only a partial estimate 
of each individual student ' s learning in relation to the 
total course 'content . Yet/ since several individuals 
responded tp each questibn on the pre:-test and several other 
items on the post test,^ a relatively, accurate est;im'ate of \ 
the entry knowledge level of participants as a group and the 
growth in the knowledge following the course' activities 
can be obtained. 

The course instructor graded "each question in teritis of 
the knowledge displayed irr tlie answer which had been 
constructed by the individual participant. Three scoring 
categories were used /with a a. 0 value assigned for answers* 



Table 7 

) 



Pre- and Post Test Items and Theij Assignment to 
Test Forms for the' Urban S-torm Wdter Course " 



1. ' How are the parameters for stormwater pollutajat wash- 

off models determined? . 

2. Which stormwater pollutant washoff modej. are you 
familiar with? What is your major criticism of this 
model?' ^ . . . ' " 

2. What is the state-of-the-art model '-(ecjuat ions) for . 
.routing unsteady streamflow? , : ^ ^ ' 



4.. In^stotrnwatfer and ''quality modelings' what <foe6't;he* " 
texrni /;reg:ioji\alizatTanr" mean? ' ^ . ' * " • ' 

.S. When are iJjJte eel.e^it>es pf strdairtflow aha water quality 
- ».._rapaela the^Hjne? ' , | . . / ' " ^ • 

■ * ^ V ' . ^ i ^ / ' ' ' " * / ^ " ' ' 

6. '^"Distinguish t>etv/een dynamic/ 4ind k'inematic, waWs- 

7. Ho*w cd^. a. person bfes< sim^l^te tfte dynamic' response 6f 
,a refceiying stream's- water quality system t£> stormwater 

pollution? , ' ^ > ■ • ^ ' X , 



10 



Are all ,one dimensibnal ^treamflow and water quality 
'^routing models compatible? Explain your answer. . ^ 

.What iTiodel would -you ir^commend for routing streamflow 
r dAiring' periods of storinwatar runoff? 

What process (es,) is (are) involved in the wa&hoff of * 
pollutants from' an urban watershed? 

' ♦ ' ' ' \ ^ 

11. 'How can data collected at ome'watershed be transferred 
% -to another watershed? 

12. During trie* preliminary assessment phase> of an area-wide 
quality study;, how tan a person 'identify thoise land use. 
areas tliat are potentially significant -sources of 
stormwater pollution? ' . . ' . . , , 

PRE-TEST QUESTIONS ' POST-TEST QUESTIONS' 
' FORf^S ~ ~ ■ ' 

A • r, 2, 3 ' '.3,9, -11 

B ■ '4,5, 6 : . 2, 4, 12 

' C ' 7, 8, 9 5, 7, 10 

. ' . 10,-11, 12. 1, 6, 8 



with .no. knowledge of the -concept, 0.5 being useH for scoring 
the presence of some knowledge, and ,1.0 being assigned -for 
correct knowledge of the question , content Each item wag 
scored in this manner for each person. The total possible 
score for any^perspn was 3.0 and the minimum fcore possible' 
was 0.0. Responses to ^ both the pre- and the post test were 
.scored in this manned by the ins?tructor^ahd the results fojr 
individual ' students on. th'e four test fdrms were^r^co.rcJed'.*' - . 

The equivalence' ofv the various tof-m^ of; th^ te^i: 'the 
pre- ar)d post tesj: roles was* determined by^B one-way "(analysis 
Of variance Across the scores of participants on the four ' * 
foiims of the tes^. In this orocedure it is assumed, that 
because of, random assignment of participants to test forms 
on both the pre- and post test/all participant groups can ) 
be viewed as being ' equivalent in* terms of their expected 
mean scores on the tests'. Therefore, if there are ^ . 
statistically significant differences between the four forms 
of the test in either the' pre- or post test role,, the non- 
equivalence of the tegt forms would be suggested, . "In light 
of such findings, the 'teat items and foi^s might ne'ed to be ^ 
rewprked. ' . ^ > . " 

Ta6le 8 contains the 'observed total scores for the 33 

persons* who beg.ah the course and were randomly assigned"" to 

one of the four forms^ in the prertest role. Persons^ from all, 

three replication^ of the course were poolX^.^^^ the analysis 

In Table 8 the aq'tual test scores for persons ' across the ^four 

-273- . . • ^ 



Table 8 



Pre-Test Total, Scores Across Test Forms 
Urban Storm Water Quality Modeling 'Course 



3.0 
0.0' 
3.0 
1.5 
O.S 
1.0 



Stati'stic 



s 



6 

r. 500 

l'.265 



Test Form 



B 

i.o 

, 0.0 
0.0 

• 0.0 
0.0 

-D-.-Q 
2.0' 
0.0 
0.0 

^0.0 
0.5 
1.0 



12 

' 0.417 
' 0.634 



1.5 
0.0 " 

o;o 

0.0 
0.5 
2.0 
0.-5 
..0.5- 
1.0 



9' 

0.'66.7 
0.707 



1.0 
2.0 
2. .5 
2.5 
1.0 
2.0 



6 

1.833 
0.'683 



» One Way Anova Table 

Source §§. . ?1S F ' p^. 

Between Forms ld;,629 3 3.543 5.480 0.004 

Within Forms 18.750 29 '0.647- 

'.Total ' 29.379 



-274-^ 



forma of the, pre-test are listed a^ong with the mean scores 

'and standard deviations for each' form., Pol low.%- this 

• p • V- ' . . . ^ - 

. information a one-way analysis, pf variancfe table is presented 

, It is apparent tha.t the- four form's of the +-est are' not ' 

equivalent for the 3 3 persons who completed the pre-test. 

Forms D and A were the* easiest, and forms B and C the most 

difficult, df course, these findings could be due to 

diffei?ences in the entry' level Jcnowledge of participants in 

the -four- groups even though they were randomly assigned. 

by' chance alone, more able persons may hav^ been assigned to 

forms A and D. However,, the test items and fqrms should be- 

reworkera to insure more Equality, of assessment of entry level 

knowledge in- the course content, since the, statistical test 

s 

shows that the probability of such a distribution of scores 
HS'^®^ random assignment of persons ' to test forms is unlikely. 

Table 9 is a similar presentation of the individual 
total scores of the 31 participants who actually completed 
the course across the four forms used as a post test. Again 

V 

the individual, scores of persons and test forms^ standard 
deviations, andean analysis of variance table are presented. 
Persons from all three replication^ of the" course are pooled". 
The results indicate that in the pA^st test role^the four 
forms of the test are not statistically significant f roiff one 
another. Of c^p'uirse , ■ this may* be caused by the fact that the 
instruction in^the course. has caused, all persons to master 

' : . -275- ■ ' ■ . ' 

2S4 



.■Table<;9 



Pp's^t Test Total. Scores Across Test Forms 
Urtfan Storm Water Quality Modeling Course 



' Test Form . 



2.0 
2.5 
3'. 0 
3.0 
2.0 

3. a 

2.. 5" 



Statistic 



n 

X 

s 



7 

2.571 

■0.A.50 



B 

2.5 
2.0 
3.0 
2.5 
'2.0 
3.0 



2^. 500 
0.447 



. 2.^ 

.2.0 
.2.0 
2.5 
1.5 
2.. 5 
1.5 
2.0 
3.0 



9 

2.167 
0. 500. 



3.0 

3.0 ' 

2.0 

l-.O 

3.0 

2.5 

2.5' - 

2.0 

It; 5 * 
9 

2.278 
a. 712. 



Source 
Between Formp 
Within Foruis 
Total 



One Way Anova Table - ' 
SS df ' , MS -'F 

0.827 ' 3 0.276 0.900 

8.270 27 0.306 

9.097 



0.454 



ERIC 



-276- 



235 



the material to "such a high-level that differences in the 
difficulty -level of the four forms of the test no longer 
exist. Or it may be^ tt^at the ^our forms of the test are 
equivalent. - ' ■ 

For purposes of this example, it is assumed that the four 
forms of the test are equivaLent. It is also assumed that the 
tests have been properly constructed , 'and that they are 
'both valid and reliable. Administration ot the" tests in the 
pre-- and post teat roles to other aroups and subsequent 
analyses 'would help confirm or refute these assumptions, as 
would the carrying out of appropriate item analysis and 
test form reliability aivl validity studies (See Chapters 
10 and 11). It should be noted that the method of assiqninq 
persons to test' forms in'both the pre- and post testing, 
altl^ouqh random, prevented any 'person ^rom. taking tjie same 
form of the test ixi both th^ pre- and post test situation. 

Figures 2> 3, and 4 show. the results of the three 
replications of the course on both pre- and post tests'. Pre- ' 
and post tests scores are plotted against the rank of 
persons in each section of the course. Persons in Groups 
1 and ^ have been ranked in, order of the overall quality of 
their total written responses on the post test, a procedure 
carried out in addition .to the individual grading of items. ^ 
Since there are 8 perso|^s in .each group, there are 8 ranking 
categories. F.or Group 2, which had 15 person^, the actual 
observed post test' score categories ar.e used for the ranking. 
* -277-' ' '" 

o r* ' 



•' PRS-AND POST TEST RESULTS 
Ul<bAi\ STORK. VUTSR Q.U^VLm ^PD£LIKG: 
REMOVAjt' ^AND IMPACT . 

• UWUP 1 ^ 



3 



Score, on 
Tests 



© 



A 



0 



b = 1.71^1 



s s= 0.417 



1 - 

« 




0 ■ 












0 ' 


b = 


• 


• * 


1 










. 0. 




■* s = 


•' 0 - 

\ 






0 


0 


0 




<- 






Sbiident Code 




1^1. 123.. 


1^3 


122 


I 

132 


I 

133- 


I 

131 


1^2 




Post Test Rank 




1 2 


• 3 


H 


5 ■ 


6 


7 


8 




Test Form Talcen* 




b ' B 


a 


G 


G 


G 


D 


D 


■c . 



0 . 6v6 

o.ojy 



lTidiv:).dual pre^-tost score 
Individixal post test score 

Test xaean' score fltcross p'arsone for pre^- and post tests 

Person rank on post test by pre- and post test regression line 

Test standard deviation 
Value of maxiinum score ^3 » 
Value of minimum score ^ 0 



Figure 2 Illustration of a graphip means for reporting learning outcome. 

^measurements for a short course to individual participants and groups. 

*SQa Tablo 7 for determination of test form 

-278- 



2" 



1 



Score "on 
** Tests 



1 - 



\0 _ 



PRE- ANDtPOST TEST RESULTS ,, ' ^ 
URBAN STORM WATER QUALITY MODELING:. 
REMOVAL AND IMPACT 

, GROUP- 2 * s 
■ n = 15 



®1_ 
3* 



m 



©1 



0. 



Post Test RaXv. 



5 > 



Individual pre-test scores 
Individual post best sodrps 



Test mean scores acz^oss ^persons for pre- fnd post tests ^ 
Person ranl^ on post teyt by pre. and post test regression lines 



■I 



Test stand^ixi deviation 





• *» 


Pjre-Te5t- 


Post Test 


3 . 


m 


0.195 


0.5o5 




b 


Q.'513 


0.500 


0 


r 


0.195 ' 


1.000 ■ 






1.16? 


• 2.36? 




s 


1.097 


0.611 











•Plguro 3 Lwaming outcomes resulting from a short xjoty^se for engineers 
on ^ban Stox^ Water Quality >ipdQling^ . 

Individual persona pre- and ^st teart scores across forms cannot be li^ietl 
in this figure because persona ore ranked hy score categories rather thun 
by individual persons, ' , * • 



90 



d8 



3 



Score on 

c 

•'Tests 



2 -J 
1 J 



\ 



PR^- AND POST TEST RESb'LTS 
URBA^^^ STOglM WATER. nUALITY MODELING: 
REMOVAL AND IMPACT 

' *GROUP 3 •, 
' n 



ra = 0 . 238 
b =.1.179 



X = 2.250 
s = 0,598 



0 



.-m =i 0.155 
b = -0.1.96 
r = 0.819 ■ 



X = 0 . 500 
s = 0.463 



3b -ah 3^1 3il -.- 3l!5 ."3^ "3^5 



3 ..-^ 



;5 

{3 



6f 7 

B • B 



8'; 
C 



' Studon^ CodQ 

\* Post Test Raok ' I ^ Z 

. Test Form Tal^on* * A * A B ' B 

Individual pre-test score ' ' ^ * ^ 

iDndividual, post teist s%)re - . • < '* 

Tbs\ raean-«^3c6Ve across persons for pre-' and past tests» - 
^ '"^ • *' 

Person rank- on post test by pre- and po-st test regression line 



Test' Standard deviation 
Value of maxi.muxn score = 3 
.Value of minimum score ^ 3 ' 



figure 4 Learning outcomos resulting from a short course for engineers 
•on Urban Storm' \Mer Quality Modeling. , - 



♦ See Table 7 for detemina^tion of tost fom. 



Since only 5 of the .-7 possible total score .cateqories 
ocrcurred in Group 2 , there are 5' ranks. ' 

A mean, standard deviatuon'', and regression iine"are 
presented, for* both 'the pre-test "and, the 'post t'est 'for each • 
of the three groups. , The information is rpresfited , 
qraphicallv as, well as nprnericairv.' . A gu-ick glance at. the^ 
three figures_ reveals the basic, patterns': The ix)st test 
means are Seen to bcs uniformly hiqh .and about the same -value- 
. fo-r all three -groups.. "The pre-te.?t medns 'are-^een be 
.much^iower airid ^bou>t.„the "same value for Groups 1 ^nd 1, but 

-'^p be higher far ,Group -2; , The greater variability of the 
.-.-.>.■> . . * 

four forms of the fces± acrosg the persons on the >rfe-test is 
'/.,.,'■ t ' ■* ■ ' ' • ■ '' •• ' 

iminedfately apparent r*esp8ciaLXY. for 'Groups 1-and "2 . . The* * 

most striking feature.of :the- graphs as the consistent and ' 
large^fiference ^etweWn the /pre-test ^d' post- t«^t jiieans 

'across the"'t!firee grpufisC ' = . ■ ' 

• ' ^ ' ^ ■ , '* ■ \ . 

•^-ta_of this- type . collected :0-n well designed pre- .and 

post t«?sts across ;repliqations» or courses is strong evijience 

of -the degree of learning, which'has^ resulted from th^'course. 

-Presented in graphic form it is mufch^ more . ^interpret able than 

- - • * " ' _ 

if simply* presented numerically/ C6mmercial*.comp'ufer- 

programs eX'ist which allow for the easy tabula*tiOn of pf^- . 

-and post te«f data and, the construction cff plots similar 

to t'hose presented in Ficures .1 through 3. /.The information^ 

gained from siTch data is useful for both formative and , ' 



sununative , evaluation orocerlures. It can be used for« ' 
reporting results of learning outcomes to manv qrouos ot' 
persons; includinq course participants, course ^ instructors , 
program administrators, client agencie^s who sent participants, 
and accrediting groups from professional organizations.. It 
is 'the most basic data which can be obtained about the 
learning outcomes of the course. Without this or similar. 
informatioYi about the performance capabilities of course 
participants at jthe beginning an-d end of the course, little 
can be said with assurance about the degree .of' learning which 
has resul4:ed .for individuals or for groups taken as a whole. 

For these reasons the evaluation of individual learning 
outcjomes or the evaluation of course effectiveness generally 
needs to be base3 upon some similar procedure. The data 
collected is not- only useful to reporting the individual 

f 

achievements of particular students to tHem, but collected 

♦ 

over replications of a course 'and over many courses within 
a program, it can be verV effective in making assessments of 
tlTe effectiveness of courses, .various instructional; 
organizations and- arrangements , and the credibility of 
programs' of continuina education offered by universities 
or other agencies. 

5?-P?-?l^A^£ A®A^^lill?_9.^1^PiLej. _tQ^j'n^ividual Students 

It 'is well established that immediate feedback concerning 
the accuracy an^l adequacv of performance facilitates student 
■ * • . -282- . 

e 

30i ' • ' 



learnina and motivation ♦ Course*^ instrvictors should always 
provide students with the fesults'of corrected homework 
problems/ quizzes, ^laboratory exerrrises, and Q#her types o^ 
embeddevd test tasks as soon as possible after thev have " ^ 
been administered.. Immediate knowledge of results is desirable 
in these situations. This procedure allows individual- 
.students to compare their own recently completed performances 
to those presented bv the instructor. Oftentimes the 
instructor cannot provide students imiBediatelv with their 
corrected. and scored homework' nrpbl ems, exercises, and 
quizzes because time is needed complete the correction of 
student responses. An alternative procedure is to hand out 
common sets of correctly work^ problems ,~ solutions to test, 
problems, and laboratory exercises. Tlfrs method achieves 
immediate feedback' to students about the adequacy of their 
recently completed performance against a detailed example of* 
how the problems or exercises^ should have been completed. 
However, correcting and grading -of individual students' 
homework and, other papers and prompt return of these to 
students remains important.^ 

In shoift courses, similar to those often used in 
continuing^ education activities in engineering, time is so ^ * 
limite«3 that it is difficult to correct promptly students* 
work and* to arrange adequate opportunity for individuals to 
examine and reflect on these results. Some of the method^ 

^ a ' ) 

for overcoming this obstacle are noted in Chapter 7 in the 

-283- 



discussion of embedded testina procedures. It is imoortant 
that both students and instructors quickly obtain knov/ledqe 
of results from embedd^djtest tasks in short courses. 
Otherwise the assessment procedures serve no aseful function 
for guiding learning and instruction. The Haan and Barfield ■ 
.•(1978) ".Hi^rology and Sedimentology of Surface Mined Lands" 
course, as it is .desctibed in Chapter 7, is one good example of 
how to provide students' in short courses with' im^edd'ate know- 
ledge of the results of their performance. This ciburse maJces 
use of embedded test tasks, sample problems, and completely 
worked solutions handed out as soon as participants kave 
completed assigned work. 

^One advantage of the multiole choice test format or 
other short answer objective- tests is that i^hey can be 
scored imraedia.tely after students have completed them. A. 
standard scoring sheet can be used hy students to mark the 
appropriate answer to^ each question. The answer sheet can 
be scored, by machine immediately, right in the classroom, 
If the proper equipment is available. Equipment for this 
■purpose is currently commercial Iv available. However,' even 
without such eqvllEanent, standardized answer^sheets for 
multiple choice questions can be scored by hand using a 
scoring over-lay or a master answer sheet!- A test with as 
many as thirty items can be scored in as little as 15 seconds 
by this procedure. Furthermore, the correct 'answer to each 
question can be marked on the student's paper by marking 

' -284- . ^ 



J 



throuqh the opening on the master or^answer sheet* The total 
number of errors can be counte^i as scorina *oroceerls. and be 
noted on th^ student's answer sheet and recorded bv the 
inVtructor> The scored ahswer sheet with the correct responses 




*-to^ items the student missed can be returned to the individual , 
student a few seconds after he or she has completed the test. 

\The student can be allowed to qo over his or her own test usin r 
the corrected answer she.et and the test booklet which 
contains the questions and problems. In addition, a 
solution^ sheet can be provided which explains why a particular 
answer is correct to each question on the test and why the 
distracting' Options for each question are wrong. Use of 



.±his procedure is extremely effective in providing students 
wi^th information about their performance. Tests carried out 
in thi§/inanner are ver^y instructive to students who quickly 
icien1;:^fy what specific areas or concepts they do not yet 
understand and need to learn to master. The immediate 
scoring and reqording of the test results by the instructor 
also ^lert. him or her to problems that' individuals or groups 
are h^ing. Often, immediate corrective abtion can be taken 

i • 

V - 

by th^ instructor when' the testing is completed in order to 
remedy problem areas. , 

figure 5 is an exa^aple of an actual answer sheet scored 
by a hand key and returned to the student immediately a£ter 
the teist was completed. The scoring key is simply another 
answer sheet with Spaces punched out for the correct answer 

"if 

' -285- • • ~ 



j: 



GENERAL PURPOS^ 
ANSWER SHEET 



UliAD IMI f OLLO^A/iMG BEFORE; YOU BEGIN, 

• USt3 b),ul- M 1 pt'luil (nily (U2\ or SOllt;f) 

• Makt' Ihmv-, M.uk rDcfikb lUdl fill idf ciiclo Cun'»pl^''<^ily 

• fcf<ibo ( K'.tn jiiy jnsKvi'i^ you wibh to cha'fige, 

• M*jko fjo strny n»dfks on iliis answor sheet* 



5'? or 



TKIS IS TKL COHKLCr WAN TO MAHK YOUR ANSWERS 



V A R C tW 

. 1 O0^!l(^>'^^ 



ABODE 
?O0©©# 



A 8 f 0 
3O«000 



A H C . D E 
1©#^0©^ 

A C n t 
2'©^©0© 

A U C*l) E 
3J&0®#0 

A . C U E 
4©J?'#®© 

fi C I) t 
5 J& 01^00 

A [J c d'*e. 
6#00®© 

A C D E 

7O^0«® 

*A 6 C D F 

8)D01>00 

A H C D E 

,90^j|©© 

A a C D ^ 



4 A B,C ij ,r 

n ^i0O)p ' 

A B C D ! 

J2#0Oj(jjf:) 
13^0'#G;0 

A B.C l> L 

14 0^^00® 

A B c D r 

15) 00001'.) 

A B C I) r 
1600010.'^ 

A B C tJ E 
17 00CV0O 

A B C (; F 

iarr)(^0'i;( - 

A B C O I: 

Hh0000Oy 

A B C 0 £ 

2O0D©©0 



ABODE 
A B C D 

22COV;00© 

A B C D F 

yCO(!)000 

B C d' E 

24 ©Y^©®© 
A B C D E 

25CO000® 

ABODE 

260000G 

ABODE 

27^)(^0®0 

A B r: D E 

28 0000(':-) 
ABODE 

23CO0000 

A" BODE 
30000©® 



A,(^ r u E 



r3ij!l(:)' v®G> 

A (> ^' \) E 

A i> <: o E \ 

^3^iM,')(j.®0 ^ 

A J; c D f 
340e^' n0@ 

• A U I) E 

35<'0«7)' V00 

• '\ B C 0 E 
3fi^«)(v" i' W0 

''/\ B ^ O E 

A ' ' o I 

A B < U . 
39(V^;.-n'..00 

A I'. I) E 



40 0 



A. B G D E 
410©0©© 

ABODE 
420©0©0 

A B 0 t> E 
43 ©00©© 

A B C"D E. 
44©©0(5^© 

' ABODE 

4b ©©0^0© 

^ ABODE 
4b 00©©© 

ABODE 

4/©®©0®^ 

ABODE 

48":)00^:/® 
A B 0 E 
49©©©0© 

ABODE 

^0.©©©©© 



A [i C D L 

51©0G;©© 

A ti 0 D L 
52©0O)©®^ 
ABODE 

b3G)0O)©Cv 

A W C D f. 
S4©0,0©© 
V A I: (. D f - 

55©0O;©e 

A 'C, D F 

.SG©^^^'©© 
ABODE 
570(£ 0©© 

ABODE 

58©C^>0©© 

ABODE 

59©00©© 

ABODE 
6O©0©©© 



ABODE 
61©'?)©®® 

ABODE 

62©0(r®© 

A B C D f 

63©Oj0®© 



A B 0 T) E^ 

71 ©00(?>^) 
ABODE 

72 ©©©0© 

A B iJ E 



ABODE 
81000©© 

abode' 
82 0)000© 

A B^ D E 



A B r [) E 
91(00' .0® 

ABODE 

92(jX^;c''^)0 
A B i, o r 



ABODE 

1O1©©0®© 

ABODE 

1020®©©© 

A B 0 t:) E 

lO3iO00^O® 



ABODE 
111 ©©0©© 

ABODE 
112©00©© 

ABODE 
1 13 ©(£)(t)0 © 



^Figure 5 Sample Standard Answer Sheet for Manual or Machine Scoring 



ERLC 



Indicates the student's individual response to each item 

^ Indicates the correct response as scored by the instructor using the 
r scoring keyr when the^ student has responded' incorrectly • 



30, 



to each question; I*^ no standard answer .sheets are available), 
suitable answei!-. .sheets can be constructed by tvpincr rows and 
columns of zero^/or capital "Os" on a plain sheet of paper ,' 
and adding numbers for rows and letters for 'options within 
rows. The mastet or scorinci key is made and usfed in the'samt- 
. manner as is thelcase. with the sample answer ^heet in Figure 5 
Similar pi:ocedures can be used with other objectively 
scored test items. If the test items result in a particular 
'numerical value, the construction of a particular diagram, 

or in some other standard response which can be quickly 3nd 
^objectively scored, relative immediate knbwledqe, of results 
can be communicated to students by scoring tests and returning 
them to students as soon as they are completed. 

It should also be recalled that multiple choice test 
items can be used for any type of testing situation.. The 

\ 

sample tes.t in Appendix B illustrates how complex performance 
capabilities may be tested for by well designed and 
abbreviated test' tasks. Much is said aJ^out this in earlier 
chapters. It will suffice to npte that if more attempts ' 
were made to cogently encapsulate the basic features of 
the intended learning outcomes of courses within well 
const^cted test items similar to those shown in Appendix B, 
it wou'j.t3 be much easier to score student performance 
immediately after assessment a*nd communicate the results of 
individual's performance to them at once, ' - 

-287- 



At the end of courses it is also'' important to communicate 
the Achievement results to tf)e* individual . .Information about 
how much each p^son had learned with respect to his entry 
level knowledge is of -interest to the participant. If ths 
course is small and the pre- and >post test scores of 
individuals have been plotted^ against some criterion of 
performance capabilitY (e.g.,. Figures 2 and 4) this information 
can be shared with individual students following the course. 
In both Figures 2 and 4 each sli^ident can identify him or 
. herself by the student code number. The pre- and post test . 
scores of the individual can be identified. ' The' person ' s 
performance in relation to Ofcherj persons in the group and 
in relation to the degree of mastery can all be determined." 

• For courses in which it- is not possible to list each 
individual's performance on a group performance graph^ it 
is still possible and important to provide participants 
with information on their own performance in the course.' 
Figure 6 is an .example of a standard form which can be used 
to report 'the achievement of individual students tn courses 
to persons at the completion pf a course'. Typically it 
might require a few days to compile and prepare all of the 
i^ndividual achievement .reports for .a course. These can be 
mailed out to students at the conclusion of the course.. With 
efficient .scoring and processing procedures , if is often 
possible to provide participants with this type pf information, 
prior to. their departure from a course. In any event, the 

-288- 



ERIC : ^ 



, * Figure 6 

. MANIJAL INDIVIDUAL ACHIEVEMENT REPORTING FORM 

—] - -gjj 



Course : J Urban S tor m Water _Qual ity _Mof1el ing : Removal 
and Impact' .- " ^ " . 



Instructor (s) :^ Dr.- Mi chael Meado ws 
Date (s) : July 24 , 'l979 * ' 



Your performance ^n the pre-test and post test is reported 
below m both graphic and numerical form. In addition, 
information on the perfdfrmance o£ the class, as a whole is 
also recorded. 



Class Performance 
Pre-test (xi) • ' Post test (x2) 

n = 8 " 



n = 



, , r% X (%) = 2g^00 ^ (%) = 81.27 

§.^d4 (%) = 25.20 . s,d.(%) =13.90 



Your Performance « 

Pre-Vest (x^ ) , Post test' (x-?) 

% Scdre 'o:Ci<i . , S3.3 2> 

Percentile Rank ^-J TO (oZ,i^^ ' 



-Plot of individual and group performance: Pre- and 
.post test 

"* 0% 



^ 



100% 



n - no. of participants . ^ Your pre-test score 

X = mean test score ^ ^ 

s.d. = standard deviation C 3 ^^Vour post test score 



We hope this ih forma tiqn Vill '*be beneficial to*.ypu. 
Should you have questions or comments please contact this 
office. • ^ - 



♦Percentile ranks based Upon rank order scopes of pre- and 
post test scores shown 'in Figure. .2., pag* 278, 



-289- 



opportunity to share with individual students tile results • 

• of their performance , on the pre-test a'nd the 'pos't test should 

' ' t • . ' \. ' 

not be ciependent upon completion -of .the achiievement report. 

■ . ■ ' . ■ \ • 

Rather, by using the methods described above, these tests 

'.can also be scored immediately and the results communicated 
to^ the students with corrected answer sheets, the test 
question booklets, and solution sheets. the t^t results 

aris not shared with students immediately in this or some 
simiiai? manner, the Instructional value of the testing 
ptocedure is lost. . , • 

Even when students -have received immediate feedback qn ^ 
their performance- on learning assessment activities, there is 
still a ne6d to make a summary 'report^ to eadh student at -the 

9 

end of the. course. Figure -6 is a completed achievement 

• report fftr one student from the Urban Starm Watel: Quality ' 
Modeling Course. The student may be identified In Figure 2.' 
The name listed in Figure 6 is. ficticious,- but the results 
are for a rea^ person enrolled in-^this course. The form is 
designed^for easy use .by clerical 'staff involved with .the 
continuing ediication program. Only^ the' i;iformation which is 
written ift, by hand need^ to be prepared for each individual • 

.student. The remainder of the information .pertains \o the • 
learning outcoihes for the entii^e'.'course . a course has many 

participants .enrolled, -.the information which^ applies to the 



entire co.urse can be typed on, a 



master copy for that course. 



•Sufficient copies can b^%uplicat§d to prepare an 

.individual report for en'roli^', She specific information 

^for each individual ' can ^^^^tfl^ a'dded with minimum effort. 



This is illustrated in Fi,qurQ/&^Xbyt^ 'handwritten information, 



The iiame of the course, Ife^^t^i^^tor and the data cVn 
all be typed in on a master. TKe '^s^^rcjcedure can "be 
followed for the information About /ti^e; ciasV per fprmarioe 
In addition, the -pre- and posij t4i*|;^means apd standard 

deviations can also .be plotted o?i\he maste?^. A 1 J. .the copies 

* -J . ' 

ne^ed for the total enrollment of the course can then be ' 
duplicated. What remains is' to s^pfy add the information 
needed, for any individual. T]iis amounts (b6 adding the ' 
person s name and istudent code nujiber, listing^ therperson' s 
pre- add post test scores and percentile-ranks, aijd. plotting 
the person's pre- and post test spores on the line which 
already contains thd group means and standard deviations. 

*The line at the bottom of the form on whichv the group 
pre- and post test meand and standard deviatiopa are 
plotted along with the individual • s scores on the two tests 
is facilitated by the length of the line and the metric 
in which the test scores are r^polted. * The line is 10 
centimeters long. The scoring metric in percentage of 
the total possit>le scor*^ It is a simple matter to plot an 
ind'ividual 's score directl^-^s a percentage, on t^ie 10 
centimeter scale with a metrioNruler ; If /time is short, the 
group means and standard tfeviati6n3 ^can be plo'tl^d on the 



master farm for the ..roup, and the indi^^idual be instructed 
•in a standard comment on the form to plot his or her own 
*^core& on the line if so desired.- . ' • 

The form is^simple to use.' it provides individual 
participants with the basic information they heed concerning 
fprmal assessment of -their learning outcomes. in a course. 
In addition, if , a form similar to this one was prepared f.or ' 
each person in each course ii), a- continuing education program 
ovei;,many offerings pf courses, much information about course 
effectiveness oyer replication's and much information about 
program effectiveness across 'courses .could be accuiijlated . ' 
9 ^ in 

tion alone^will not serve to replace the 
tacit evaluation of courses and the institutions which offer 
them by the client agencies and groups who enroll in \ 
continuing education courses, such information can be very * 
useful in a supporting Way. " It, can also serve as In aid to • 
quality control,, improvement of courses and programs of - ' 
study, and evideilc^^ the worth 'of continuing .education 
activities operated by universities, colleges and other' units. 
Accrediting agencies',, professional societies, and .governing 
boards and grotjps all .have- legitimate interests in t|iis type 
of achievement data- e»ggregated over courses. 

The task of "preparing the individua.1 achievement report 
can be simplified- by computer processing. Figure 7 -is an 
ex^pie o/ a computer prepared report. The .report ia for a 
student <rap«rired in one section of the '"Hydrology and 
Sedilinentology of Surface Mined Lands". In. this case, the 
. . ' ' -7292- ' 

• 311 .' ' ' 



Figure 7 

Computerized Individual Achievement Reporting Form 



OFFlCc Uh LUNl INUl'.G fOUC^TION 



CUUKSE NAhi:. 
i>l Ic OH S'clTIUN: 
UAlt Of- LUURSi: 
aNSlKoClUKlSJ : 



HYUKULOOY A.,0 :>c01M6NT0uCC Y from SUKHACE nlNtO LAf.DS 
TULSA, UKLOhOMA 

DR» biLLY bAKhlcLD AND OR, TOM HAAN. . 



: LLAbi* PtKFnRMANCk CI = "JO) 

_pRtltST_ _PnST16ST_ _GAIN_ 

AVckAGc TtlT bCUKt lC»e l*^cC t>.2 

bTANUARO DLVjAfiuN ^.1 2 b 

TtbT RtLUblLlTY • ,6f^ 

:,1U.£KkOR UF MtAl>. *1.7 
PtRCtNf OF Tm£ LLmSS . 6,0 60rO 
MASTtkiNG Ittt ItSI , • 



; • YOUk TtST, bCOKt 1 3 IP, b 

AS HAKl OF Trit AbOVt COURl>Et YUu WtRC GIVEN' TWO 2? CUFSHON MULTIPLE 
CMU1C6 TtSlS. The T ES T >uE ST lOuS >kEKE CLOSf-LY RELATED TO Ihfc MATEKIAL 
COVtkcU .buKlNG Th6 CLXJKbCr ThL CLAbb WAS NOT FXPECTEO TO UJ WELL ON 
. Tut FIRST Uk PKE-TcST, UF IT mctic UTHCRWl^F, THE COURl^: HlbhT 

HAVk LiTTLc VALUL TO P Ak 1 1 C 1 H J UN THE OlwER HANOt T Hfc CLASS WAS 

tXPECTLb 10 UJ OUlTt WtLL b.N Int PUSTTEST WmICh WAS ADMINlSTtREU AT 
Tmc tW JF IfJSlrtUCT ION. I-KUK Iht VlEwPOPHS OF THf INSIHUCTCRS AND 
CUUkSc OtSlUNtKS, A PUSTILST LcuKE OF 17 OR PfTTc,* 1$ fcvlUfcNCb THAT IHE 
CLASS AS -A HrtOLt AND PAKIICIPAN^IS INDIVIOU/LLY *^cT 1h6 STATtO L^AkNlNG 
*OQJtCTl*Vti FOR TMt QCUkSh,. ^ 

AN IN131V lUoAL *$ kOSTTEST SCuRkt nJ^tVE^K, DEPFK'OS ON MAt.Y FalTOKS - • 
SuCrt AS, FCK iNSlANLLt 0*Jl*S PK1,0R KNO'wl F DGJ * M> FAMlLlAklTY wITh 
Thk/oENEKAL OlSClPLlNAkV AkcA AS WfcLL AS TmE SPECIFIC SUe Jf C 1 . MA T T t R 
OF- IhE COUKSt AS ThtSE Akt mEaSJivED bY THE POtTFST. whEThEK Ok NOT 
A FAkTICIPANMS ^OSTTEST S'Curvc t^UALLcD OR ryCFHOFn 17 IS^LESS IMPORTANT 
Than iS ThL U IFFERENLt • Uk bAlN i^cDRE bufWEEN Th^ PCSTTtSl AND TmE 
PkbltST.. FUR THE CLASS AS A. wnULE , THIS DIFFFRENTIAL GA i'n MEASURES 
LtAkNINo OLCUkt\INL* A:, A ntNcFll jF 1 UST R.UCT I ON, AND CCU'-Sk PARTICIPATION. 
FUUk UR FlVt PULNJTS L;IFF£ktNLL btTwFEN Pf<ETFST AN1D P(jSTTbST INDICATES 
SXbNIFlCANT IMPKtjyEMtNl AS A KLSULT OF ATTFNOINT. THE COURSE. 

Wk huPE *JnIS INFDRMAllUN wiLL be bENfcFIClAL TO, YOU. SrtUULU YUU 
HAVk Out ST1GN;>.UR LOMMtNlSt PLL^i^c CONTACT THIS OFFICE • 




-293-' 

^ JL *w 



ERIC 



report is foe a real person enrolled in the class. in the 
.interest of privacy, the person" s* name and addresis, which 
would normally be listed on the computer printout, is omitted. 
All the basic information is given about the-'individual ' s 
performance and the performance of th^ group. In addition, * ' 
test reliability and standard error of estimate values are 
given. For classes with large enrollments t;he form could be . 
prepared directly, from data- students generated in their*^ 
response to muLtiple choice ^questions on standard and machine 
scorable answer sheets. There are many ways to automate the 
processing 6f test data and the preparation of individual 

. • » ' % 

achieveiTient reports. ^ 

Keeping Learning Outcomes of Ixidivj.duals\priv'ate ' 

While individuals enrolled in coursesV should have the 
results of their performance on 'assessment J^nstruments and of 
their performance in' the course as a whole coimnlinicated to , 
them, this information is not .pj^operly communiaated to 
anyone else (Tyler & Wolf,. 1974). Participants in a°given 
course may be sponsored by their employers. In thik case, 
with the permission of the student beforehand," it, is proper. ^ 
to release .the individual's performance record in the course 
>to' the employer. Generally th6 employer will.be interested 
■in a. global assessment ~x>f the individual- student. In short* 



this translates into Reporting- whether the student xiompleted 
,the' course successfully or not. ^ '• 



■ '313 



Another gtoup to which a participant might direct that^ 
* * ' . • » ' • 

the results of his or her performance recqrd for a course 

be sent are professional organizations and groups which • ^ • 

(^edit-dnd record continui-ng education units .. . Here, agai.^i, 

the persons who supervise ,such activities are generally 

interested _ip/^'om^ global judgment of the course instru(ttor ' s 
«. ♦ • • • ' 

^assessment of, the student ' s oerformaijce, in the course./ This . 
usually tran^Blet^s intp some judgment aKout the overall 
adequacy of the individual stodentls performance in .the'\ ' 
course, either as^y:^cceptable or unacceptable for the. CEIL credit 

There ar e ^^ wo E>^ints ■ t o be^ad e . .The first is that oniy* 
thoi^e .pej:son.s feesi'giva^d by\nd i vidua 1 course participants ' 

•should have any\^ihf ormatipil"«ent tp them about 'ind4^vidual" ^ 
student pet forma nq^'^Ln'Tbhe course. The sedond is that the 

s ■ ' ■ . ■' ■ • ' * - • 

inf^iftatid/f- needed -by', these, gcoi^ps is of , the '^pass" or '"fail" 

or "successfully completed" ^of' "not succwssfully completed" 

type. It is inapproRriate to send^ the individual's detailed' 

learning report form with all of the- information about the 

person s pre- apd: post, test results to these other gooups. 

The pass/fail judgment is .sufficient." If .the individual 

wishes, to ^^re the (detailed .report .-with his employer or 

with^a professional licensing agency i he or she may do so* 

This is rfot to. say that inforniation of the type contained 

y . . " 

in Figures 2' through 4 should not be shared^ with ^ployers ' 

and persons responsible for supervision, and record-^Tijg of 

CE;USi There is no problem as long Is the da^a is ^grbup data- 



and what is reported is the effectiveness of the course ■ 
generally for groups of students. What is inappropriate is 
to fdentify the perf ormancye of individual students in such"^'- 
presentations, except for use by the individual student and 
course instructors. 

If a course has been shown to be generally effective and 
the persons ..who admin isrter and teach it to be' competent . and 
iresponsible, there is no need for- either employees or others 
to have more information about an individual student's 
performance in a course than to know if it was successful or 
unsuccessf al. There is too much opportunity for misuse of 
detailed achievement data by employers or others if it is 
provided.. "F.or example, a supervisor could conceivably decide 

\ 

to promote one individual and^ not another on'the. basis of ' 
detailed test scores in a course and the relative rankings of 
the t^o persons. If both persons had passed the d*ourse this 
would be inappropriate, and it might also be inappropriate 
even if one person had not passed the course. There is 
ajimply too much error in . individual pefformamce test scores, 
even under the most ideal conditions, to mak-e such ihf erences 
and to be correct most of the time. Other inf ormajbion «about 
the individual** skills in the^.vork setting on a range of 
tasks, about their attitudes and Interests, and past 
perf om^nce are, much more cruciatl in^ making such decisions 
(MpGiellaod, L973; Stice, 1979)^. There are many persons who 
do not understand these points and who are prone to use- a. 
test scoyjB as concrete evidence for a decision^ which should 



be made by a more informed process requiring much more efzort. 
Test scores of persons are often abus.ed in these matters 
(McClelland, 1973) . 

The main value of test scores is "^heir aggregation for " 

. persons over replicatidns -of courses. Used in this way, 
scores obtained' from well designed 'tests can be very useful 
in evaiaating the ef f ectivejiess of courses (Tyler, 1974). 
Another major use of tests is as part of the learning 

, activities which constitute the. instruction for -the courses 
•in wh4.ch they are used. It is for instructional purpbses 
that individuals' test scores should be shared with them 

" immediataly after the completion of tests and' in reports 
to them upon course' completion. The individual who has 
recently gompleted a course can interpret the results of pre- 
and post tests in the context of the course activities in 
Which he or she has engaged . The. scores of persons are 
balanced and' jneaningful in this context and , in relation to 
how; much the individual feels he or she has learned in areas 

• not measured by the tests. 

It is also appropriate for individuals registering for 
continuing education courses to decide if they choose to be 
involved in the testing at the pre- 'and, post test stages. 
Being involved ia. the course will- often require completing ' 

,the embedded test tasks and should routinely be required as 
are the .other activities designed to tfeach individuals the 
content and. skills of the course. If persons are seeking 

*CEU credits, and if appropriate pre- acid post! tests are 



. / -297- 

ERIC- - ■ • ^IS 



/ . ■ . • . . 

■ ■ ' dvdil^ble, these ouqlit also to be reiiuired. .t persons' wis:; 
^to enro-li without seekmq CRU 1:redits and thev wish liot to 
participate in testing, they should be allowel to do so. 
About the only exception to this situation is where failure 
^ ' ,. le^rn to criterion some particular skill or content would 
result -in property damage or threat to health and life. In 
these cases, testing should be required for all participants • 
An example 'cited in an earlier chapter is the proper 
assessment of individuals' competence in operating dangerous • 
and expensive laboratory or industrial equipment 'before 
allowing thei^tOjdo so certifying that they are competent 
to do so. ^ ' 

Precautions to Prevent the At>use of Test Scores 

Attention to tl\e matters discussed in the previous section 
help prevent abuse of test scores. There are other precautions 

I ' • 

which sho^d be observed to insure proper use of test scores 
and other performance assessments of students in con'tinuing 
education courses. 

Before decisions are made about the effectiveness -of. »^ ' 
courses in reaching their desired objectives, the te°sta .or 
or other asse'ssment procedures used .must b'e determined to be 
». reasonably valid and peliable. Methods for doing so have been 

-outlined tin some detail in previous chapters. Poorly designed 
, tests are worse than no tests.' Their use may •lien'^te 

students who are quick to see, the invalidity of tests and 
• test -items, especially at the post test stage when they can ' ' 
judge how closely and how adequately the. test items ancj" 
assessment procedures match the content of the course. 

. . -298-' 

ERIC 317 ' • 



Instructors can be helped by qood tests which are 
appropriately comprehensive while at the sme time beinq 
brief and time efficient. Imposing a poorly designed test 
into a s^^ort and already very 'full time period which is ' - 
needed for instruction' is a serious aggravation to instrucHiors 
as 'Well^as^:to siudent;s.. Any type of testing or assessment.. 
Vhrch IS developed for a course needs to be developed with 

the full part«icipation*of the course instructors and should 
#- • -> 
** - . ' 

be specifically related to the "^course material. If such 

cooperation cannot be obtained . there is little point in 
imppsiag an external testing or assesment procedure on a 
course or an instructor. Both students and instructor are 
apt to resent' the intrusion, the results of the testing are 
likely to be invalid,^ and the data not particularly helpful 
to the revision of the course and its ultimate improvement. 
On the other hand, if tho. course instructor can be convinced 
that well designed tests and assessment- procedur^g aan be 
useful to promoting instruction, and can be encouraged to 
bedpme Involved in designing appropriate procedures r much 
will have been gained* ' 

If a course is to be developed aaid offered many times 
bver a period of months, or years, proper, testing and 
assessment of learnfLng outcomes is important •to bbe* foirmative 
evaluation of the pourse atd its quality control throughout 
its life time. ^ In such cases, the investment of the initial ^ 
time and effort needed «to ' develop good^^tes^s for the course ' 

-299- . . 

■ ' 31*3 - > 



A 

can be repaid many times over. The benefits derive from 
having good information concerning the^^peration of the, 
cpurse at various times, under variou^conditions , and 
with different instructors. Often muj::h is' learned in the 
design and evaluation of one course which is useful in the 
design and operation of other couirses. 

Somqtimes, if a course is to offered only once or twice 
for a special group o.f persons, there is no need to engage 
in elaborate test development activities* ^ However,* in 
most courses instructors will have notions of what it is they 
expect students to learn and be able to do ^t the end of 
the dourse. Beginning with these expected learning outcomes 
is basic to the very design of any course of instruction. It 
is often not difficult to translate these ^expectations into 
som« sort of formal asstfsmant tasks. The Urban Storm Water 
Quality Modeling course presented ^s,an example earlier in 
this chapter "^is just such a^ case* The course was not offered 
many times. It was very short , being only three hours long. 
The instructor had a* clear notion of what he" expected 
students to achieve as desired learning OQtcoijfies. It waS 
relatively easy for him to' put together 12 test items which 
would give some indication of t^e entry level and exit level 
knowledge of his students* Furthenmore, the information 
collected was useful to the instruction of the students. 
In addition, it would provide better evidence .of the 
effectiveness of the courlse in reaching the intended 



objectives 'than more casual information collected in a less • 
systematic way. Therefore, it was a good idea to develop and 
use t>he prq-.and post tests. The results shown in Figures 2 
through 4 are certainly informative and helpful to making 
decisiojis about course effectiveness. For "this course, and 

i 

for similar short courses which are not fco be replicated many 
times, it does not make, sense to devote gr^at efforts to the 
development of testing and assessment procedures. Wb^t was 
done for the Urban Storm Water course was very, adequate. 

The manner in^ which the t6st items weVe developed was 
also cons.istent with the optimum procedures described in 
^Chapter 10, although some ^individual steps'were omitted and 
the whole procedure took only a short time. To the extent 
-that /an instructor has a good -grasp of what he or she expects 
to achieve in the- teaching of a course, and to the extent 
that he or she is well organized in his or her instructional 
plans iand activities, it is 'not difficult to sample appropriate 
tasks fcom within the' course activities to be used as test 
items. Althou^ the procedure looks formidable, in -total as 
outlined in Chapter 10, it can actually be carried out quite 
quickly and easily for a sliart course if the instructor is 
skillful and well prepared in the teaching.^'of the course 
content. • ^ • 

It muat also be remembered^that any test, no matter how 
well designed, cannot measure all' of the important learning 

-301- . • , ' 

^ 320 ' . ■ 



■» * 

outcomes for persons enrolled in- a course • People have their 
own reasons foj: enrolling in courses and of ten 'have valuable. 

4 

learning outcomes unrelated to the formal objectives of the 

course. These types of individual, and sometimes unexpected', 

outcomes have been described in earlier chapters... 

It is much more appropriate 'to evaluate the general^ 

effectiveness of course's in achieving specified learning* 

outcomes across persons than it is to evaluate the learning. 

of specific persons by tests or 'other assessment procedure""s • 

If the courses and programs offered by a college or 

university can generally be shown to jueet their intended 

objectives, and if the people who operate these programs' and 

courses can be determined to be responsible and competent on 

the basis of past performance, the claims made for future 

courses in advertisements are credible for future clients. 

Persons can enroll in courses of their choice for* their awa 

pg^rposes. What they take aw.ay from the^course at its 

conclusion, in terms of .'their own feelings of personal 

relevance, utility of course conteht, interests, new^ 

perceptions, and attitudes will almost always have much mojfe 
■ ' ' ' ^ \ 

meaning to them than any tesrt iscore or set of test .scores. 
,The best use of tests is to evaluate the effectiveness of 
courses^ toward specific intended learning outcomes, not for 
the definitive determination of hpw much any one individual 
has learned from the experience of the course. 

-302- 

^^^^ ■ • / ■ 



jiak i nq Course Eva luations Pu blic 

As has been mentioned in many plac-es .earlier in this 
^ook, the evaluation bf courses must necessarily include much 
Other information than the measured achievement of students 
based upon testing.. The' perceptions of participants enrolled 
in oourses and their employers about the relevande, conditions, 
and quality of instruction; the competence of theUnstructor 
in the content of the course- and in teaching; and records of 
the ope:i;ating characteristics of courses must all be u-sed in 
I the evaluation -of the effectiveness of courses and programs. 
Other"^ information concerning the organization of the sponsoring 
program and the competence of its admjLhistr'ation are also 
involved. The specific learning outcomes resulting from 
testing in specific aspects 'of course performance have little 
meaning without this additional contextual- information. By 
itself, the performance data on -specific aspects of the course 
Ms! -little influence or utility. It is not by . itself- very 
convincing to persons who make decisions about enrolling or 
not enrolling in future courses. The tacit evaluation which 

■ • ' " ■ 0. 

governs these'types of decisions is almost always based upon - ' 
other types of information other than test scores. > 

All, of these additional types of informatidf ought ta h^\P 
collepted routinely. This information, along 'with the- 
achiev^ent. outcomes,, similar, to those presented in Figures 
2. through 4 , 'ought- to be tabulated and presented for public 
• . ■. ' -303- • ' 



7 



ERIC 



ex^minatiph^,;^^^!^^^^ the i'nd,ivia'ua.ls v^hCare sent^ - 

by Qompanies t^0\6%ajj6;iTr^^^^^ tieough continuing 'edupati6ir 

courses have a rl^lrt .tb'^i^o^ . 
programs, instructor^/' .arid/ institbribft continuing 
eaucatiori'^-regrafii^r. .Indiv,id:ual' engineers, who may wash to I 
enroll in courses and thi^i^.^grof^e^a'^ societies also have - 
a similar right-.-t^V^his -ty information. Consequently, 

there is the ne.ed tO' collect and present in cogent- tabular , 
graphic, arid, narrative form' much additional- information other • 
than simply specific course learning "outcomes^ if judgments. 
Of- effectiveness of programs is to be made in a reasonable 
manner ' . ' ^ . ^ ' . 

The routine collection of this comprehensa^ve information 
about the operation of courses within dbntirtuing education 
programs can be very helpful, not only to clients and 
consumers of the cours^ but , to the institution, which' offers 
the courses. Demands, for accountability are inciseasi-ng • 
0"^® 8af best/Vays of being accountable is to syS'tematically 

collect thlsr range of information and make it widely and 
publica>ly available to any persons wishing to e:;camine it. ^ * 
TheiSest interests of the public which- consumes the courses 

Id t^h6 institution offering courses and programs may be. 
^served in this manner. ^ 



-304- ^ 



323 



Conclusion 



__^This chapter has focused , on the needs of various persons* 

* 

involved in continuing education courses and experiences in 
engineering to have goo4 information concerning ^the degree to 
which intended learning outcomes have been achieved and the 
effectiveness with whicH course^ have, operated "Examples of 
how to persent and interpret, achievement data based upon" 
pre- and post testing in key performance areas of c|)urses 
have been provided* Means foif r^pprting the results of 
learning outcomes to intiividual students have been described ^ 
as weLl as precautions for protecting the privacy of 
individuals -and preventing the misuse of this JLnform'ation 

have been discussed. It is argued that carrying out these 

\ * , ^ ' 

types of assessment activities -is most useful fot malcing 

decisions about the ef f eitiverfess of programs' and courses, 

and least effective for making sharp distinctions between 

persons, and how. much each individual had learned following a 

course • It is also argued that responsible coni:inuing 

education programs, should consistently seek information about 

the achievement of their course enroliees, as well as much 

\ \ •/ 

other information about the e^ectiveness \^f instruction and 
course operating bharaoteristics. This information -Should be 
used for improving continuing .'education courses and programs^ 
It'^should be summarized in cogent. ways and be publicly, 
disseminated as part of the accountability process. 

^305- " ^ 



f 

Chapter 14 



RECOMMENDATIONS FOR EVALUATED CEUs 



There is currently much interest in "iontinuing education ' 

« 

in enqineerincT and other -technical fields about "evalu'ated"^ 
continuing education units. The idea is that persons who 
complete continuing education courses need to be held 
accountable for having actually learned somelbhinq. Certificates 
of attendance are not acceptable to many persons and groups 
as evidence of learning resulting from continuing education 
activities, ^t is this concern v;hich has motivated manv of'^ 
the activities of The Learnina Outcomes Measurement Project 
with its emphasis upon ways of measuring the learning 
resulting from short cours^es typical of ' those offered 
through continuing' education programs 



Courses W her e Tests Provide Accurate Estimates of Learning 
. The activities of the project staff h^ave shown that 

there are, -a number of ways by which to measure and estimate 
the degr,ee of learning resulting from a continuing education 
course. Some of the methods' by which the learning ou£comes^ 

of a course may be measured depend upon the type of course 

\ ' ' 

which is being considered. - As has been noted earlier, the 
best measure of the learning resulting from a course designed 
vto. remediate or upgrade general knowledge and skill, such as 
is required to pass State licensing examinations, is the 

-306- 



ERIC • 



student's actual performance on the licensing examination. 
It is possible to sample a few items from the total domain o 
items on the professional examination which accurately 
estimate the individual's performance on the longer test. 
This can be accomplished through the use of relatively new 
psychomfetric test construction procedures involving the use 
of latent trait item analysis of test scores (Lord & Novick. 
1968; Shoemaker, 1973). 

ThQs, for courses af this type it is possible to deve.lo 
and use short but powerful tests to determine not only the 
learning outcomes by individual students after the course, 
but by which to determine which students need to take the 
course in the, first place. Once such a test-is developed 

4 

for. a remediation or upgrading. of general knowledge and 
skills course, it can be used as a pre-tesf for advisement 
and screening purposes -or ' as a post test for assessment of 
the learning of individuals who have completed the couarse* 
The scores of individuals , on pre- and post tests ov^r 
replications of the course may also be recorded,' summe(f and 
the mean values and standard deviations'- calculated . 

From this inforlnation it is also possible to evaluate 
the course and its effectiveness as well as to evaluate the 
learning of individual students.' For such courses carefully 
constructed pre- and post' tests are very useful and very 
accurate indicators of the degree oflearning'achieved by • 
individual students and the general effectiveness of the 

-307- 



course. The primary reason "Tor this is that the performance 
domain of interest is clearly defined as a body of knowledge 
and skill thought to be basic to an area of engineering 
practice and incorporated on a broad spectrum professional 
examination. ~ • - 

Cour.ses_ Inhere Tests_are Ina dequate F.stiroates of Learning 

Evaluation of .the learning outcomes of 'other types of 
continuing education courses is more difficult. This is 
because the 'domain of .performance ±n which 'course knowledge 
and skill may be aj^lied is not well defined. It is also 
because the ways in which the specific concepts and skills 
acquired, in the coUrse may^-he applied oV 'the' job. by the 
engineer after the course is completed cannot be clearly 

' ' ' . » 

and unambiguously stated. ' An illustration may help clarify", 
this point. i . • . 

* - 

Suppose a course is developed to broaden and update the 
- skflls of mining engineers in terms of the-'design of drainage 
and storage structures f^r the runoff for surface mined lands. 
>he course presents .the latest thinking, alterations in , 
older theoretical models, and newly developed algorithms and 
nomograiJtis. by which' to^make accikate and efficient calculations 
of the degign df -these structures to meet newly developed 
Federal standards for water quality control downstream from 
the mining area. This is the type course developed by 
Professors Haan and Barfieid (1978)- titled "Hydrology and 

-308- 



^ Sedimentoloqy oj Surface Mined Lanjs." This narti,cuVar 
course was studied by the oroject team and tests were 
developed by Uich to measure -entry level and'- exit levil 
knowledge and skill of course participants in course" content: 
The .prqblem is that- no matter how ^ood the tests are they 
can never be a truly effectd^ve me^ns of' estimating all. of 
the important learning outcomes which m^y have been achieved, 
by any individual following completion of the-course'. " 
There are several reasons for this 

k^Elt^d J!iil»e__for_J^ 

•Pir-st, in any continuing education course dealing with ' 
a large and complex body of information and skills, most of 
the time-needs to be devpted to instructional activities. ' * 
Students need to have concepts and procedures demonstrated ' 
and they need to apply these in -practice problems. 'Most 
.continuing education, courses are short because practicing . 
engineers cannot afford to spend long, periods of time in 
course attendance. Thus, "a short aqursr^of 2 t6:.4 days \ 
dura'tion^is a conunon oc^rrence. However, a truly adequate' " 
test of_ the learning. outctomes of the Hydrblogy and 
Sedimentolqgy course Vould require ttetythe^ studen't ac'tually " 
construct the design specifications fdr'a Jater drainage and 
storage system for an actual mining operation'^ ' iThis would 
usually require a minimum of fro^m 6 to 8 hours depending 
upon tSe problem characteristics. There is-, simply not" enough ' 



available time in. the short coiwrse to. devote such larqe 
amounts of time to ^ **test" item. This becomes particularly 
apparent* When one realizes that a good test would require the. 
.actual desiqn o.f,- not one, but:several drainage and storage • 
systems for cli-Cf er en t topography climatic , goii; and mining . 
conditions. ' ' , ' . 

pecause of this , problem of time";"^anv test must be 
greatly abbreviated and„ simplified . Hhis simplification and 
abbreviation makes the t^st tasks ^fferent from the 
•performance tasks actually involved in the design of such 
str,uctures -on the job. This means that any c^bod evaluation 
should require the engineer who has completed the course to 
, submit a .sample his next few actual drainage and .storage ' ' 
designs to the course instructor. These would be designs 
th6 eng^ineer had" actually produced-'^on the job after the 
course had been completed. The cour,se instructor could then 
evaluate the degree to which the principlesTand tecjiniqiies 
taught in the course had been accurately applied and used 
by the engineer. Of course the problem is that it would take 
mapy .hours for the instructor to evaluate eac4i actual design, 
the evaluation could not be completed until aome weeks after ' 
t^he student had complet^ the short course and" had time to • 
learn hoj^ to apply the course principles in the work setting, 

^^il^^J^? ^^^^^^L'!'^,^ t^^n wpuld; possibly 

be available to complete Such a thorough evaluation of 
each, learner's achievemfent'. ^ * ' 



liialeq'tacv of Ji:est ijia in S^jr^^ 

A second problem is that even if such an elaborate 
^valuation of each person's learning were adopted, it would 
not he valid for 'some of the course participants. 'leather, 
itVould be valid for only those person^ who came to the ' 
'cofirse with the intention of actually usinq the course 
principles and techniques in their daily work activities in 
specific desiqn problems. - ~ 

Experience of the' project staff and many others has 
shown that 'for any particular course there^are a wide 
variety of learners' enrolled for a variety of reasons. For 
example, in the Hydrology and Sedimentolo^y course, one often 
finds persons enrolled who are not normally engaged in the 
desiqn of drainage and storage structures for surface 
mining. Sometimes persons attend such a course because they 
have -business dealinqs -with engineering firms which do carry 
out such designs. ^The purpose of attending the course , is -to 
become more informed about the problems and methods used by 
these designers and not to becomeifexpert in the actual design 
of tlie structures themselves Administrators of state 
regulatory agencies, state inspectors, and other persons, also 
not normally engaged in the actual design of such structures 
frequently attend such a course. | < , • 

„_ 1_ All -o-f-tWesa^ pe^ona-may- -l^am^ s-' ijrear ide^ from- tHe ~ 

•1 ■ . . ■ 

course but none of them might l?e expected to put into 

1 . . • - . * , 

practice the ajptual principles taught in the course. >he 



valuable things learned by this Qr->up of persons may be the 

names of research persons who they can hire as consultants^ 

to provide the technical assista*nce they need for specific 

f 

jobs; the names of resource manuals and documents as well 
as computer programs useful to the solving of particular 
problems.; and identification of areas of expertise and 
.knpwledqe presently lackinci in themselves or in members df 
their organizations which need to be developed either throuijh 
the upgrading of . present employees skills or the hiring of 

a 

new employees • 

All of these outcomes can^ be very valuable to the 
participants. They are certainly all outcomes which would be 
valued In terms of making a contribution to engineerirjia 
practice in the region. Yet, none of these oiltcomes would he 
ijieasured by a comprehensive evaluation of the actual designs 
foi; drainage and storage structures produced by this group of. 
persons. Generally persons in this grdiap would not produce 
such designs when they retarned^ to work. Rather, they would 
use the knowledge they acquired to better manage their 'firm, 
superA^ise employees, and obtain the services and Resources 
needed by the actual designers of such structures within • 
their firms or under their jurisdiction. 

Growth _of^ Learn ing After Course Completiori ^ 

- A -third probrtem-h^s-:to "do^wi th " wheir ^tKe 1 ear nihg'" 

resulting from the course may be expected to take\place. In 



highly complex and technical courses which have large 
amounts of ' content, all that can 'be hoped for in a short course 
as an immediate, outcome is a general familarity with the 
course concepts and procedures. 

Actual facility "in the use of these concepts and 
procedures is almost certain;.y dependent upon serious 
continued study and attempts by the course participant to 
actually apply anci use -the course materi^il in his or her 
work setting. Therefore, the maximum amount of learning shou3 i 
not occur at the end of the short course but sometime after 
the completion of the short course when the learner has had 
ti*ae to do much additional further , study and application of 
course procedures. Courses in computer programming are 'good 
examples. They usually-, teach 'only the basic principles and 
how to understand complex procedures and manuals f Facile 
computer programming comes only after much actual application 
of these concepts and principles in many work related 
problems. . . > * . ' 

The Ne ed for Multiple Indicators of Learning' Out cojnes 
, For ill- of these reasons it is important to have 
multiple indicators of the degree of learning, reisulting from 
most continuing education courses if one wishes to determine 

^'i^A_'A®^?'}^"3J^^^S°'n^ej_^ S_impl_e pre- and^^EXist 

test scores of individuals at entry and exit from the course 
are useful- but not^^ficient to the .task, short entry • 

• -313- 



and exit tests can provide qood estimates of the degree to 
which, participants have learned the basics of the course 
•princif>les and procedures. If properly constructed th?y car 
also estimate the degree to which participants know how to 
go about using the materials, manuals, algorithms, nomographs 
, and other |)rocedures presented in the course for approaching 
or setting up a few sample problems likely to be" encountered 
im their actual work setting. Performance on such tests 
tells very little about other important outcomes the engineer 
may have learned. 

a 

This means that the learning outcome evaluation of a 
course's effect >needs to i'nclude not only pre- and. post tests 
on the basics. of the course .content, jDut also systematic 
polling of participants concerning what they think they have 
learned, how much they think they have learned,. how relevant 
they think the learning is to their job perfprmance, and how 
likely they are to do additional study '^in this area and 
-attempt to apply course concepts and procedures to their • 
actual work. The intentions and perceptions of course 
participants' are very important I 

*■ - . . 

The same is true for the perceptions -of the supervisors ' 
and employer's of the course participants. The judgments of." 

» 

these persons" about the worth and utility of the material 

.^y.. the emplq^yee_is very. .Important. ..--J 
is the tacit evaluation of these supervisoirs as Well as of 



the engineers who enroll in su6h courses which is the most 
.potent and meaningful evaluation for anv course* 

If the judgment. of these qrouos is that the course is 

♦ - *■ 

valuable and worthwhile in terms of improving knowledge and 
skill in areas related to on-the-job performance, the course 
will be highly recommended and heavily subscribed whether or 
not there is any formal assessment of learning outcomes by 
testing or by ^he examination of actual job performance or 

► woi:k samples of course participants after the course is 

» 

completed. Tf the tacit evaluation of these professional 
groups is that the course is not worthwhile, no matter how 
valuable the course is shown to be in improving test scores, 
it is not likely to be heavily enrolled. This professional 
judgment* tacit eva'luation is an important and legitimate 
part of the information which should be routinely gathered* 
and .incorporated in evaluations 6f .th^ leai'ning outcomes of^ 
continuing education courses. 



The ' Impossibility of Making "Complete" Learning 




Assessments of Individuals 
^ 



It should be apparent that it is an impossible task 
to collect all of this wide r^nge of information in order to 
evaluate the learn-ing of any one particular person who has 
taken a particular continuing eduoation course. There is 
^nb~t"tTme ^o "do^^sor and^-l^^ t^ntire process mak^s unreaso4iabLe_. 
demands upon the participant and his or her employer. How, 

. -315- 

' ■ f- V' •• • 



- then, can an individual enrollee ina given cou»se be 
certified as having learned a specific amount in a given 
course? 

\ Beyond certifying that the individual has attended, 

participated fully,- completed all course assignments and 

activities at described levels .of accuracy, and'has-also 

\ 

completed a pre- and post test which demonstrated a. certain 
\ • • ■ • . 

amount of growth on so;ne of the basic knowledge and. Skill 

areas in .the course, th^re is little that can be s^id ^bout 
an individual's actual learning outcomes of a more br04d 
' and irp^rtant nature Which may result from the^ooursd. 

However, it is. possible to evaluate the effectiveness of the 
course for participants generally in more substantial ^ays. 
In a nutshell, it is muc];i more desirable to certify cdurses 
than persons., ^ '* , » ^ 

Means for Making Comprehensive. Assessment of Course "•■ 
Ef fect iveness 

•Although it is impractical to obtain all the various 
types of evidence needed to make a strong inference about 

. the broad" range of learning outcomes which may result from 
a p&rticula'r person completing a course, it is prac^ic^i" to 

■ gather this wide array o^^vi^ence across different perspns 
who have completed a. course. While- the instructor cannot" ' 

^5_°H^°X °^ ^PPlicatiofi^ 
of djjurse principles to the actual design structures of ' 



practicing engineers for real prob] ems- from tft^ield for 
each. course participant, the actual de^i^^^^rom two or 
three persons enrolled in a cour^e^n be randomly solicited 
and evalua-ted. Over severa>^lication.s of the course this 
• practice. reveal^^muchMfprmation about the effects of the 
cburse in actualjXssisting the e^agirveers enrolled in/ 
producing^b^er structures, 'it also reveals much about the 
vaciap±6n im the decree to'which the course principles "are 
^propriately applied following the completion of the course. 
Other participants and their employers and supervisors 
can beysampled and int-ervie\ved about the actu'al degree ' 
they judge course pi;inciples and* procedures are being used. 
Again, not» every person must be interviewed. 

Other persons from among the population of past course 
enrollees can be sampled and asked to complete a- delayed, 
post test of course content and skill.* it if even possible 

\ N . ^ - , . 

and .sometimes .desirable .to ad"minister- different te^t items"* 
or, tasks to different persons at the end of a. Course. "This 
would be done when there is a large ambunt of material to 
test, time^ bnly.to administer a few test items to any one 
person, and- an interest dn Tearning something abo\it the ' 
effectiveness of th'e. course over the entire large afey of ' 
'items. ■ ' ' . 



. ■ Such a plan used over several replications of a' course ' 

produce, much - Inf otmat ioiv about - the- ^d^arBe-'eTf ec tiveness" iif 

teaching its pa^rticipants a variety of ou.tqomes. Of coMr.se,, 
it provides little information aBout any individual-' s learning 



•IiP.qical .Requirements_Jo^E^ Courses * 

Any movement toward an evaluated continuing .education 
• unit probably ought to be based on the course developers 
having to provide information about the general effectiveness 
of the course. This information should be based on the 
evaluation of actual on-the-job performance of samples of 
persons who have completed the course: It should also 
include the perceptions of samples of - past enrollees and 
their supervisors concerning ihe value and utility of the 
course for improved performance in work related activities. . * 
This information should be presentee^ along with information 
- about entry and exit level knowledge and skill on the basics . ; 

of the course as these can be measured in short and time 
. efficient pre- and post tests. Professional licensing 

agencies and other interested groups such as practicing 

7 . , * * / 

engineers and the firms, whicW employ them ought to have ' 

V" " • ■ • \ , ■ 

access to this information. The effectiveness of the course 

could^then be jigged in a more formal way than the present 

_. and common tacit evaluation way, but*without removing this 

^valuable professional judgment component. Courses could be 

certified as be^ng worthy of an evaluated CEU on the basis 

of thi'S eyideVij^.' \ . - , - 

^gl9jbL-Rgqu iremen^s'for Certification of Pers ons' ' 

"The entry level knowledge anar"sktar^or"^paf ti:aipanti.^ 
the reasons ^r sons. enroll in a particular course, their 

-318- 

o 337 ^ •, '■ 



ERIC 



eocpectations -for learninq from the course, a record of their 
actual participation and complistion of -^ou^rse activities i 
their actual nerfornance on ^hort and basic. tests of key 
course knowledne and skills, and their perceptions o^f^,how 
much they have learned following the course, should all be 
rountine-ly collected for per§'6ps enrolled in continuinq 
education courses. "However, this assessment shoald be 
carried out ojilv with the consent of the enrollee. ^J|therwise 
the results are likely to be invalid. If a course* ^^cll ee 
wants to receive ^ evaluated' CEU , or some other'type of 
certificate which reports learninq, he or she should first 
be enrolled in an approved course which has demonsTtrated to 
the satisfaction of a professional licensing agency that the 
cour^p does indeed achieve its 'intended learning -outcomes 
ma con5i?ti6|it\'?nanner .'^ „ ^ - * . . - 

.The ^^^^^^^l^^^j^^^^^^ for eaVning an evaluated CEU or ' 
other formal crlSre^should. be thai the indivWual participant 
be wil'ling to complete short' P^^" and post tests. and - 
questidnnaires desiqned. to' obtain ' basic^dnformati-on about 
individual learninq whic^^ can tie relia^Dlv and eaisilv collected 
<^n each person in short periods -of time at the' beginning or 



end of the- course. A third conjjition is that 'the ^Participant 
en^^Sge in and comple^e^).l prescribed Learninq activities 
which comprise the course. ' - ■ * ' 



-319- 



3CS ^ 



Tfee_J"lPo^tance of Options for ParticiDan'ts*. ' • 

: "x. 

It IS important -th^t' enroll ees in a course be allowed the 
option of whether or not to receive CEUs or some other form 
of formal credit for learninrr. some parsons may be expected 
to be enrolled and not be at all interested in receiving 
formal documentation of. their 'learning ^owever , if persons 
are interested in receiving such credit, they should be 2^ 
expected routinely to complete all pre- and, post tests ancf 
related short questionnaires wh~ich elicit infoi-mation^ about 
expectations and reasons for attendance, estimates of 
individual achievement, as welt as judgments by participants 
ab9«rt. the utility. of course content. 

Persons wishing to 'receive J^a^l "credit also should be 
informed that they and some of their employers will be sampled 
in the future for follow-up interviews, delayed post testing.. 
^ evaluation of job performance, an^ submission of actual work 
samples to be evaluated. . It should be clear that the purpose 
of this follow-up assessment is for purposes of determining 
the general effectiveness of the course in reaching its* 
intended learning outcomes in order that the course may be 
improved and eventually documented as being worthwhile for 
CEU credit. ■ " 

The purpose .of the follow-up assessment is not generally 
for making a judgmfentUbout the individual's learning for 
Vhich personal GEUs would be awarded. Rather, "successful 
°?J5Elejtion- of all_coursfi-Jurti-viti©s--ineluding the testing and. 
. learning. assessments would usually be the basis, for awarding 
individual CEU credit. • . , 

-320- 



^ This emohasis upon the nartic ipat^ion of persons seeking- 
CEIJ credit in the total assessment procedures for a course Vfes 
not imply ttiat othei: persons takinq the course should not^be. 
involved. There is a need to assess the learninq of course 
participants qenerally. Data gathered from acrbss all\ 
persons enrolled is needed to improve the teachinq of the 
course (formative evaluation) and to document the present 
effectiveness of a course in achieving its intended learnino 
outcomes (summative 'evaluation) • 

Many enqineers enrolled in short courses do not care about 
being awarded CEUs or other formal credit for their learning. 
If the testing procedures are presented onlv as being related 
to CROs' these individuals might opt to' npt par;ticipate' in the 
assessment procedures. However, well designed dssessment 
procedures ilso serve important instructional functions which 
are pf -benefit to all the persons enrolled in a course. This 
relationship of testing to instruction is particularly obvious 
in the embedded test tasks in th4"coui;se whioh are used to 
inform .the learner and the course instructor of the needs and 
accomplishments of the individual during the course of 
instruction in order to make instructional. decisions (See 
Chapter 7). Quizzes, homework problems^ and labora^ry 
exercises , are typically used for this purpose. 



-321- 




To make these activities* optional would be to remove an' 
important instruc-t ional component of the course for som^ 
persons. Generally these activities should he required for 
all participants because they comprise an. integral part of the 
.course and its instructional methods. 

The same relationship should hold for other types of 
\ testing and assessment procedures as well. WhaVever other 
purposes they serve, tests and other assessment procedures 
should' serve instructional purposes, .if tests, and 

assessment procedures are developed in the manner suggested 
in Chapter and other sections of this book, i^is will be ' 
the case. Tn this event participation in the prfe-^ and post 
testinq as well as* all other assessment procedures ought to * 
be built in as oart ot the regular instructional activities 
in the course. All participants should be involved in these 
activities. The results will not only be useful to the 
improvement of the cpurse and the documentation of its 
general effectiveness, but will aid the learning of individual 
participants in a number of ways." Pre-tests inform tj\e learner 
about the specific content and objectives of the course in a 
very precise way. This information is useful to the 
individual in focusing attention on relevant aspects , of the 
course material during instruction. Post tests, when 
compared. to pre-test results, inform the individual learner 
about his or her progress through' the^ourse in specific 
tppics and areas ana call attentioh to areas in- need of 
. • -322- 



additional study. Individuals are almost always interested 
in the growth of their own level of knowledge and skill and • 
in comparing their pr^ogress with the accomplishments typical 
of other participants in the course, as well as to persons 
in other sections of the sartie course taught at other times. 
Even when persons have no desire to receive CEU credit they 
remain interested in information^ about their own degree of 
learning arid accomplishment. 

Maximum partic ipat;ion of individuals enrolled in a 
course in the learning assessment procedures can be insured ' 
if the learninq assessment procedures are fully inteqratfid wit 
the instructional procedures and if a number of poiicijas are 
followed. The testing and other assessment procedures should 
be abbreWated and time efficient. All tests should tDe valid 
and reliable. Results ol tests and other leai^ning^ 

assessments should be ^shared as soon as possible with the * ' ' 

■» 

participants. Individual test results should be kept private 
and not shared with others with<jut the specific permission 
of the individual • ' ' ' 

The uses of^ the learning assessment data gathered for 
the purposes of formative evaluation and documentation of 
course e^ctivenes^ ought to be explained to pa^f ticipants 
in order that they fully understand the imp9rtanc^ of their 

par±-icipatlon-±n^-aTid--eontrlbution" to ~t1i6~e^^ 

course, W^en these procedures are followed all participants 



-323- 



1 ' ' 

will quite naturally be in.vo1.ve1 in the assessment- procedures 
They, the course instructors, and groups of persons who will 
become enrolled in '.Che course in the future will all benefit. 

Conctusion ^ .' 

' • *• ( 

There -is no simple way po evaluate complex learnt^f? 

outcomes which may be expected to result from the completion 

of most continuing education courses in technical fields. 

If professional agencies and organizations are serious about 

c 

developing evaluated €EUs, procedures similar to those ' 
•described inthlS book and summarized in this chapter will 
need to be developed and followed. 



-324- 



313 



reperbTices* 



Airasian, P. w. & Madays, G p. Cr iterion-ref erenced testir q 
. -In the classroom. In R.. W. Tyler & R. M. Wolf- (■'Eds.), 
■v.-.firucial issues in testing. . Berkeley", California: 
•McCutchen, 1974. 



Meamoni, L. M. Evaluation as an* integral part of f 
instructional and faculty' develooment . In L. P. Grayson & ^ 
J. M. Biedenbach (Eds.)-, Proceedin gs 198t) industry education 

' g_qnfe rence . Washington, D.C, Amitican Society for"' 

Engineeriijij Education, 1980, 120-123 ; 

American Ediifcational R'esearch Association monograph series * 
on curriculum evaluation. Volumes l" through -5 . Chicago- 
Rand McNally, 1967, 1968, 1969, 1^70, 1970: ' 

Bellack, A. A. & Kliebard-, H. M. (Eds.). Curric ulum and 
evaluat^pn . Berkeley, California: ' McCu tchen, 1977. ~' 

Bloom, -B. J. Human characteristics a nd achnni learning. 
New Ydrk: McGraw Hill, 1976. : . 

Blooirt , - 6 . S . (Ed . ) . , Ta xonomy of educational objec tives : 
Handbook I: Cognitive domain" ; New York: DavTd McKav 
Company, Inc., 1956. ~ 

Bloom, B, Hastings, J. t., & Madaus, G. P.- Handbook on 
tormative and summafe^ive evaluation of student learninq — " 
New York: McGraw Hill, 1971. ^ — ^ ' 

Box, G. E. P., Hunter, W. C. & Hunter, J.,S. Statistic s 

for experimenters; An introduction to design, data 

analya-is, land model building . New York: John Wilev ^ 
& Sons, 1978. ■- - ' ~ 

Bugelski, B. R. The psychology of learning applied to 
• teaching (2nd ed.) . Indianapolis: Bobbs-Merrill , 1971. 

Carroll, J. B. A model, of school learning. Teachers 
College Record , 1963, 6£, 723-733. ~ - 

cleaver, T. G. A controlled study of the sjemi-paced 

teaching method. Engineering E ducation. 1976, 66,. 323-325 
M - ~ 

Cole, H. P. Proces s educa tion. Englewood Cliffs, New ' 
* Jersey: Educa^onal Technology Publications, 1972. 



-325- 



y 



r.-v - . 

REFERENCES 

Cole, JI. P. Principles and techniques for enhancing 

motivation and achievAnent of engineering students. In 
L. P. Gray.son & J. M. Diedenbach (Eds.), Proceedings 1^>80 
College Industry Edxication Conference . WashingtorTri) .cT7 
Ainerican SocTety'for Engineering Education, 1980, 
345-347. 

^% 

Council on the Continuing Education Unit. The co ntinu ing 
^ educ ation unit; Criteria ^nd guidelines . Silver SprTngs, 
Maryland: The Council, on^ the Continuing Education Unit, 1979, 

Enell, J. W. The CEU in the 1980;s: A report from' a long 
term user. In L. P. Grayson & J. M.\ Biedenbach (Eds.),^ 
Pr oceedings 1980 Indust ry Education Conference , Washington , 
D.C.: AmerTcan SocTety for Engineering'Education, 1980, 
185-189. . 

o 

Er icksen , 3 . C . Motivation for learning: A guide to the 

te acher of the young "actult. Ann Arbor , Michigan : University 
of Michigan Press, 1974. 

Ferry, R. Tests in adult non-credit education* CPd2 
Newslette r, F^ll , 1979/ 

Gage, N. L. & Berliner, D. C. Educational psychology . 
Chicago: Rand McNally, 1975. 

Gagne, R. M. The acquisitioVi of knowledge. Psycho log teal 
Review , 1962, 69, 355-365. • 

Gagne, R. Mi. The psychological basis \of science — A 

process approach ." Washington, D,C,; Ainerican Association 
for the Advancement of Science, Commission on Science • 
Education, 1965. 

Gagne, R» M. Curriculum research and the promotion of 
learning. In R. Ty\er, R. Gagne, & M. Scriven (Eds.)f 
Perspectives of Curriculum Evaluation . Chicago: Rand 
McNally, 1967, 19-38. 7^ 

Gagne, R. M. The conditions of learning (3td ed.)'. New 
York: Holt, 1977. - 

v 

Gagne, R. M, & Briggs, L. J. Principles .of instructiona l 
design . New York: Holt, 1977"! 

Gagne, R. M. & Paradise,, N. E. Abilities >and learning sets 
in knowledge. ac«uisition» Psychological Monographs , 1961, 



7 5 (whole No. ^8) . 



-326- 



ERIC 



REFERENCES , 

Greenfield, \. B. Comments on evaluation in engineering 
education* Engineering Education , 1978, 68^, 41)1-404 • 

Grobman , H . EXraluation activi-ties of curriculum^ projects- . , 
Ch icago : Rand McNally, 1968 * > 

GrogaQ, W. R, Performance-based engineering education and 
what ' it teveals. Engineering Education , 1979, 69, 402-405. 

Haan, C.^T. & Barfield^ B. J. Hydrology and sedimentology / , 
^^^^urlEace mined lands . Lexington, Kentucky: 0ffice~o7 / 
d^5(tinuing Education and Extension, College of Engineering,/ 
University of Kentucky, 1978. / 

Hambleton, R^ k. & Novick, M. R. Toward an integration of / 
'theory and^ me^y^od for criterion-referenced tests. / 
Journal of Educational Measurement , 1973, 10, 159-170. / 

Heimback, c: L. To PSI^and back. Engineering Education , 1979, ' 
69, 399-401. 

Holland, J. Making vocational choices; A theory of 

careers . Englewood Cliffs, New Jersey: ' Prentice Hall , 1973 . 

Hoyt', D. P. The relationship between college grades and 
adult adhievement. A review of the literature, Research 
Report l^o. 1 . Iowa City, Iowa: American College 
Testing Program, 1965. 

Klus, J. P. & Jone3, \J. A. Engineers involved in continuin g 
education; A survey analysis , Washington, D.C.: American 
Society for Engineering Education , 197 5. 

Knowles, M, S. The modern practice of adult educntion ; 
* Andragogy versus pedagogy . New York: Association Press, 1970. 

Kulik, J. A. & Kulik/ C.;C. Effectiveness of the personalized 
systwa of instruction. Engineering Educatdon , 1975, 
65, 228-^231. 

Kulik, J. A., Kulik, C. c' & Cohen, P. A. ^ meta-analysis 
of outcome studies of Keller^ s personalized system of 
instruction. American Psychologist ^ 1979,. 34^ 307-318. 

Lac'efieid, W. E. The evaluation of competence : Theoretical 

and empirical perspectives . ^Unpublished doctorainai^sertat ion , 
University of Kentucky, 1980. ' ' 

Lavin, D. E. The prediction of academjLc performance; A 
theoretical analysis and review of research . New Ytfrk: 
Rustell Sage Foundation, 1965. ^ 

-327- 

,0 xO 



ft o 



REFERENCES ^ ^ , . 

Livingston, S. A.- Criterion-referenced* applications of 
classical test theory. Journal of Educat ional 
^easur^ejnen_t;, 1972, 9, 13^6". ' 

Livingston, S A. A note on the interpretation 'of the ' 

•^EducJJ^nnl^^M^''^"''^^ reliability coefficient. Journal of 
k?_\A£^_hiOna^^ 1973, 10, '311. ~ — 

^HlJ' ^^°^i='^f Statistical theor ies of mental - 

^Vi l<=^ores. Reading , • MassiEFTus^ ts :' Ad cj ison-Wfegley . 19 6». 

Manning; w. H The criterion problem, in P. H. DuBods & 

tra?;ina^° ' ^^^-^^ str ategies for evaluation 
tr^yiHia- Chicago, 1.970, 68-7Fr^ " 

^^^S^^' ^l^jJlfL-Qorm-r eferenced and criterion 

"^^f^^Li^'u An evaluation model for developmental growth- 

Unpublished doctoral dlssertatipn. University . ^ 

-Kentucky, 1978. ■ 

■ " * • , 

""^Readlnc; '^M.a- Classroom test construction . " 
Reading, Massachusetts: Addison-Wesiey Publishing Co., 

) \ 

Marshall, J. c. & Hales,, L.' W. Essentials o f testing 
Heading, Massachusetts:- Addison-Wesley , 1^75. " 

^Jh^JaJ^A/ '^^fe^^^fst, M. C. Criteria ahd standards :. 
An Jjistj^tjJtlqnal evaluation model for C EU .actlvit j — 
Paper presented *t Lifelong Learner Research Conference, " 
VitaVJ^""^ u- f'e5>ruary 1, 1980. Richmond? ' 

Virginias Virginia Commonwealth University. 

"^^r^^l.^Av,/; \ W- J- Undemta-fi^lng and conducting - 

£e3ea^ch/Appli cations in education a-nd the hph;.vTor-;,1 ^ 

sc i endes . New York: McGraw-Hi.ll, 1976. — : — 

°- '^e^ting for competence rather than for 
intelligence. American Psychologist , 1973. 20, 1-14 

McCullough, .R. c. Current research on roles and competencies 
of professional trainers. In^L.- P. Grayson & J. S BiSdenbach 
^f^^:) ' ?£?ceed^^ conference 
Washington, D.O American Soceit^ f o r Engineering " •-. 
Edudfttion, 1980, 282-292. ^engineering 



-328- 



REFERENCES 



Mer,teng, D. M. Results of. AESP's Survey 0f Program -Interests 
for E ngineers . Unpublished paper. . Aprialichian Education 
SatelUte Program, University of Kentybky, 1978. 

Miller, D. B. Personal vitality . Reading, Massachusetts: 
Add iso,n -Wesley Publishing Co. , 1977/ 

Millman, J. Criterion-ref 6renced ma4surement In W. j. 

Popham (Ed.), Evaluation in education , current applications. 
Berkeley, California: McCutchen/ 1^74. 

Morris, A. J., Sherrill , P., & Sotivpri, M. T /ie return on 
jjwe^tment in continuing education of engineers . (Researc h 
report based on work partially supported by National 
Science Foundation, Grant No / EPP75-21587 , June 1978). 

Morstain, B. R. "& Smart, J. c/ Reasons for participating in 
adult education courses: A multivariate analysis of group 
differences. Adult Educati on/ r974,.24 (2), 83-98. 



Moss, P. J. Barfield, B. ;J., & Blythe, D. K. Evaluation in. 
continuing eaucatioft: A pilot study. In L. P. Grayson & 
J. M. Bied'snbach (Eds.)/, Proceedings 7thyannual frontiers in 
education conference . / Washington, D.C.:'^ American Society 
for Engineering Educa1:ion and Institute of Electrical and 
Electronics Engineerii, 1977, 337-344. 

Nader releases ETS report, hits- tests as pbor predictors of 
p§r,f ormance . American Psychological Association", 
APft Monitor , 1980, 11, 304:311 

Nunnally, J. c. Educational measurement- an d evaluation. 
New York: McGraw Hill, 1972. 

Salwendy, G. & Seymour, W. D. Prediction and development /of 
industrial work performance . New York: Wiley, 1973. / 

Scriven, M. ~ The methodology evaluation. In R. TyW, ' 
R. Gagne, & W. Scriven (Eds.), Perspectives of currijC^ulum 
evaluation . Chicago: Rand. McNally, 1967, 39-83. ^ 

Shoemaker, D; Vfg' .. ^Principles and procedures o frfmultiple 
matrix samplincf .. !N(ew York: ' BallinaAr, \f^1\' / — 

Snelbecker, CJ. E. Lea^rning theory, instructional^ theory , 

and psychoeducational deslgH" ; New .York : McGraw Hill, 1974 . 

-329- ; 1 



REFERENCES 



Stice, J. E. Grades and test scores: Do they predict adult 
achievement? ♦ Engineering Education . 1979, 69, 390-393, 

Thorndike, R. L. (Ed.). Educattpnal measurement (2nd ed.). * 
Washingtoni^ij D.C^. : Mtierican .c^nq^l on Education, 1971. 

Tyler, R. W. (<5 pns true ting achievement test a. Columbus, Ohio: 
Ohio State Ufiiversity, -1934 . ~ 

Tyler, R. w. Basic principles of curriculum and instructio n. 
Chicago: University of Chicago Preds, 1950. ' 

Tylfer, R. W. The use of tests' in measuting the' effectiveness 
of educational programs, methods, and instructional 
materials. In R. W. Tyler & R. M.* Wolf (Eds.), Crucial ' 
is sues in testing . Berket*y, California: McCutch^n, 1974. 

Tyler, R. W. & Wolf, R. M. Crncial issues iin t esting. 
'Berkellsy, California: _ McCut;chen, 1$,74. _ , 

Webb, W. B. Measurement of learning in extensive training 
programs. In P. H. DuBois & G. D. Mayo, Research strategie s 
for evaluating training . Chicago: Rand McNally, 3.97 9, 
55— 65 . 

Wiesehuegel, R. E. Measurement of cognitive achievement in 
continuing educa tion in engineering . Unpublished doctoral 
dissertation, 1978,. George Peabody College for Teachers. 

Wolf, R. M. Invasion of privacy. In R. W. 'Tyler & 

R. M. Wolf (Eds.), Crucial issues *in testing . Berkeley, 
California: McCutchen, 1974. 

Work, C. E.Aa nationwide study of the variability of test 
scoring b/ different instructors. Engineering Education, 
1976, 66, 241-248. ■ ' ^ 

Worthen, B. R. & Sanders, j": -R., Educational evaluation ; 
- Theory and, practice . Belmont, California: ^dsworth, 1973. 



-330- 



319 



APPENDIX A 

This appendix contains an e.xample of fibur different 
types of data collection instruments suitable for use in 
educational , and evaluational situations involving short 
.'"courses. A brief description of each instrument - its 
purpose, its recommended implementation mode, and its 
possible utilities - is given.. 

The four different types of instruments serve to » 
provide additional information about the participants, 
instructors, and operating characteristics involved in ' ' 
courses. This "type of descriptive information is needed %o 
properly interpret the results of formal assessments of 
participants' learning by testing. The sample instruments 
presented here, used in conjunction with the sample *^ 
learning assessment tests in Appendix B, collectively allow 
strong- judgments to be made concerning the effectiveness of 
courses in achieving intended objectives (summatlwe 
evaluations and in reorganizing courses to.be mooe effective 
iti th^ future (formative avaluationX . 



APPFINHTX A 



EXA'^LP A-1 



Instrument: 



Demographic Information Questionnaire 



Purpose 



A) T^o^ collect systematic data concerning ' 
participants ' personal , educational / and 
employment histories. 

B) To collect data concerning the relatitre , 
influence of a* number of factors affecting 
decisie^ns to participate in a particular 
continuing education program. 



Implementation; 



Utilities: 



Either A) As part of , 'an advance mailing of 

materials to be completed and ^ 
returned Ijy participants through tiie 
mail or upon arrival at the coxyrse 
^ite. ^ . ' 

Or B), Completed ^y participants as an ' 
initial -activity during the first 
formal meeting of the course, 

/ / 

A) To identify .characteristics of the 

••captured" aucUence. for contrast with those 
of the intended "target" audience for " 
the course. 

^B) To pnovide a source of .information reflecting 
•on the validity of evaluation methods and 



outcomes 'concerning course effectiveness. 



C) 



To aid 'faculty in selecting cpursfe content 
and designing appropriate instructional 
metjhods. ' 

D) To a^rd sponsors to -identify topics and plan 
advertising methods for future courses. 



-332- 



ERLC 



351 





' ; . • 

INSTRUCTOR (SI? " ^- L \ . ' • ' - 



DEMOGRAPHIC IKFOR>UTION 


< 


items 

' ■ T > 


Responses { cop.-cr.ts 


I. 


In what type of engineering are you currently employed? 

1. Agricultural 2. Chemical _ 3. Civil 4. Electrical 5. 'ind-s^rial 
6. Mechanical 7. Mining 8. Other (PS)'> xncs....! 
9. Not presently employed as an engineer (PS) * ' ; " ' " 


1 ■ 

i Z 3 ^ 5 

^ 




II. 


What is your highest educational degree? Please state major field in comments.. 
6" SJher?psV' Bachelor's A.' Master's 5. Doctorate 


1 / 

. r 2 3 *4 5 
6 


rt) 

1 

OQ 
f1 

03 


III. 


What is yc^ur sex? 0 
1* Female , 2. Male 

' — ■ — — 


1 '2 


D- 
O 
H 


IV. 


IVhat is your major employment affiliation? 

I' 1°^^'^^ ^^^^ 2. Government' 3. Consultant . 

4. Corporate. (in business or industry) 5. Student 6 Unemnlnv^H 

7. Other(PS)* ' ^tuaenc o. Unemployed . ^ 


\ 

1 2 3 4 5 


2 X — 

^ --4^ 3,1 

§ 1 (D 
/ ' o/ > * 


y. 


Who is -paying fot.^y-our attendaiLce at/this course? 
1. My employer 2. ^My self 3., Other(PS)*.^ 


' ' 1 

1. 2 3 


w 

rr 

O 


VI. 


Did your employer recommend this course? 

1- No 2. lYes ■ i 
__i \ J ° 


1' 2 ■ 




'VII. 


How did you hear about t|is course? ' - ' " '*' . 

1. Brochure/posted 2. irochure/mailed ~ 3. Word of mouth A. Ne<fcrspaper 
5. Professional journifl . 6. Radio or TV 7. Other (PS)* 


1 2.3 4 5 
6 7 - 




III. 


What is your age?* •• ' , , ^ " 

1. Less than 21 years'old 2. 21-30 years old 31-40 vears old " 
-A. 41-iO years old " 5. More than 50 years old 


1 2 3 4 5 


i 


IX. 


What i^ your race? 

^..•Caucasian'^-' 2. £iack 3. Oriental 4. Air.ericar. Indian 5. -.Other (PS) ' 


4 

1 2 3 4 5 




*J£RJ(^'S=''l<»^se spe-cifv in rorfce-^-s . • 







Have you previously attended this course or other continuing 
education courses in this subject area? 

1. N'c 2. Yes (Please list tl'.e Lnree -^cs: reccv.t). 

Course. Institution ^-^ 



1. 
2. 
3. 




On a scale from 1 to 5 (l=very iinportant; 5=uninportant) , please rate how 
important the following facDcrs were in your decision t:^ attet^d this cour^sel 
Circle your response* If the '^a,it,or is not applicable to your 
situation, ^;ci^rcle nup.ber^'6. » - ' - 



My employer recomended the course ♦ ^^^N^ 
I was interested in the subject area. 
My expenses were paid. 

The host institution and/or j^nstructor (s) were noted for their 
expertise in this subject area* • ^ 

I have previously attended this-and/or similar courses, and=have ^ 
found them to ^be of value,- • . . 

I wante^ to meet and exchange ideas with my colleagues. 

I need this course to maintain my present position or to be 
considered for ^ promotion. 

I wanted to learn or refresh ity knowledge and skills in this , 
subject area, so my job perfonr*ance may. be enhanced. 

Please list 'any other factors which influenced your decision to ^ 
attend this course. > • , 



or: ^ 



' c 

• CO 
U 

>% O 
u c 
o £ 



1 
1 
1 



1 
1 



2^ 
2 



3 ^ 

\ 

3 4 

3 U 

3 A 

3 4 



1-234 



3 4 



c 

C3 

u 

o 
c 

E 



o a. 
2: < 



6 ' 



5 6 



I 

^ 3 

O 
M 
3 



O 
D 

C 
O 



> 
I 

O 
0 

ft 

C 
(D 



c 



» r"| 



APPENDIX A 
KXAlMPLE A-2 



Participa n t React iorr Questionnaire 



A) To collect impress :^ns|of participants 
regarding:, \ / ' " 

1) The course faculty as instructors. 

2) The course content and presentation mode(s). 
n Specific learning outcome characteristics. 
4) Anticipated usefulness of knowledge and, 

skills acquired through the course. 

B) To elicit participants' general perceptions 
and comments concerning the coutse. 

C) To identify further tppics Ind areas of 
Interest to participants. * 

^ " ■ . . . . ^ 

To be administered at or near the' end of the final 
•formal meeting of the'courae, either Irefore or • 
after any posttest. " (After the posttest; after 
discussion of that test, and prior to or during 
closure is a recommended- time for this 
administration; ) 



A) To evaluate ef f ectiv'eness" of the course in 
• terms of several broad process indicators. 

\" • • , 

B) TO provide a source of formative information 
feedback to instructors concerning "participant 
perceptiolis of content, instrtictional pir^ess, 
and value of the course. ' • 

" Q.) To aid sponsors to evaluate addience 

receptivity to the course, its content, and 
its faculty. 

,D) To provide information to future potential 
participants regarding the perceptions of 
previous pa^ticlparfts . i' 



-335- 



lNSTP.l'CTOR(S)y 



EVAU^ATICNOr C0VR6E 



On a scale fron>. 1 to 5 (1-scrongly agree; S^scron^lv -disagree) , 
pi^ase respond to t>ie following aspects o£ the course. Circle 
your response. * I 

Instructor (s) 

^ . . * 



c o 



1. The ir.structpr(s) was knowledgeable in the subject area'! 

2. The inst;ructor(s) effectively cGiranunicated the knowledge- 
and skills presented in the course, 

3. The instructcr(s) was not receptive to, your cocnnents 
and needs. ' ^ . 

4. The instructor (s) use of examples and practice problens ■ 
was effective in demonstrating 'the knowledge and skills 
presented in the course. 

* Content & Presentation ' t 

5. The text and|/or reference materials were appropriate and useful, 

6. The organizatioif-of tHe course content was poor, ^ 
7.. The course content was not relevant.- to your work activities, 

8. Tutorial* sessions, made available -during the course, would be 
valuable additions to the structure of the course. 

On a scale fron 1 to 5, please respond to items 9 and ^10. 

9. The level of difficulty of the material presen,t6d was: 

c . * r 
10.. The rate of presentation of the material was: 



11. 



Conroents'ron the co^jrse Instructor (s) , content, and/or present^ation. 



c - 



1 2 
1 .2 



1 
1 



too , 
difficult 

1 2 
too 
slow 

1 2 



3 
, 3 

3 
3 

3 

• 3 

3. 
3 



4 
A 

4 
4 

4 

■4 



2 3 4 

2 3 ' 4 

2'. - 3 4 

2 3 



5 

5 

5 
5 

5 
5 



3 
5 
5 



too 

4T 5 

too 
fast 

4 5 



c 



•^3 

rt 

O 
H- 

D 



05 

o 

rt 



D 

to 

H- 

O 

rt) 



•JO. 



Course Objectives: To disseminate cutrent in£ornation fror. various inscitut^ons , 

including universities, governnit-nt a-gencios \nd practicing engineer* 
ing' groups, for the advaucement' of knowledge concerning these sub- 
' jects and for implemencation by field engineering applications. 
Learning Outcomes ' , . 



c o 

0 c; 

^ Of 



0 

4-) 



12. The course net its sfated objectives. < 

• ♦ 

13. The kaowled^e and skills I 'obtained from this course will be of n£ value to 
me in my job, 

1^/ The knowledge and skills I obtained from this course -ould have been difficult 
to obtain elsewhere. 

15. I met persons, other than trhe instructor (s) from whom I obtained valuable 
knowledge and/or information. 

16. * I would not recommend this jc ours e tp others in my position. 

17. It is likely that, in the future, I will contact one or more persons, whom 
I've met at this course, concerning some aspect of my work. 

18. I feei^ confident I can properly use the knowledge and skills' I obtained 
through this course, . ' ' " \ 



1 
1 



9, I Relieve it is not appropriate to award CEU credit for this course. 



20. 

2i. 
22. 

23. 
24. 



I" feel confident in the validity of the results obtained from the techniques 
presented in this course. .■ ' • 

I intend to further my st-udy into the .subj ect area presented in this course." 
My experiences in the course were interesting "and enjoyable. 

Dn a scale from 1 to 5 (l=very knowledgeab-le; 5=not knowledgeable), please answer 
How knowledgeable of the course content were you prior to entering th^ course? 

How knowledgeable of the course contentNtre you now upon. co'mpl6tion of the 
course? I ' -• ' . ' ,. >" ■ 



1 2 
1 2 
1, 2 



1 2 
questions 
1 2 





3 A 5 

3 A 5 

3 A 5 

3 4 5 

3 4 5 

3 .4 5 

3 4 5 

3 4 5 

3 4 5 

3 4 5 

3 4.5 
23 and 24. 

3 4 5 



1 2 3 4 5 



•3 GO 



"I 

O N> ^ 
0 ^ 

CO ft 

rt H- 

a c 

CD 



25. 



•Do you intend to sH«re t;he knovledge and skills vou occai-ed the 
course witn your colleagues at your place of er.pioyir.ent ? 

-a; Nc - " 

b> Yes. I air required to do so'\y my enplover. (HoV) 
c) Yes, I intcna^to. although I ar. not required to do so by rr.y eu^ploye: 



26.- WJiat- aspects of the course do you feel were: 
Most beneficial - • 



, Leas.t beneficial - 



. 27. Comments/suggestions concerning any aspect of the 



course , 



\ 



• 28. Please list thre^3) topics which you would like oresent^H nn . . ' u , ^ 

your work. presented in a course that would be. of v&lue to v 




ou m 



1. 

2. 
-3. 



3CI 



^ «^ *^ 



1 . 



appendix; V 

EXAMPLE A- 3 



Instrument : 



Sat i 3f act ion/Ut il izat ion Survey Forift • 



Purpose ; 



To gather impressionistic and factual dal 
regarding the applicability of t'he course 
and materials vls-a--vis the participant? 
roles and responsibilities. 



content 
job 



Implementation : 



Utilities 



/ 



As a part of a follow-up study of shott course 
participants. To be included among follow-up- 
materials sent to participants thre^ to six 
months after course completion. 



A) 



B) 



As part of an extended course 
cedure and used in conjunctio 
obtained prior to and during 

instrument provides data reg ^ 

to which overall course ol^^e^tives have^been 
obtained. 



valuation pro- 
with data 
he course/ this 
ding the extent 



Allows for correlational st(idies and validation 
of other evaluation information sources: e.g.; 
in contrast to demographic/ data, facilitates 
the -identification of characteristics of the 
target audience that wily benefit most from 
the course. 



C) Provides sponsors and 
evidence of course vali 
. participants -in their 



:hers with documenting. 

and applicability to 
>rk. 



Example A-3 
Satisfaction/utilization ^Survey Form 



1) Was the course a worthwhile orofessiona^ experience for you' 

DEFINITELY YES) 

YES, MODERATELY SO) ~ 

, A comment?) 



in ^ . 

^ WoQld you recommend this particular course to other 
professionals who work in areas similar to your own? 

YES) __ - 
NO)' 



2B) If YES to {2A), have you in fact dona so? 

^ES) How many times?)'* 

^ NO) 



3A) Please indicate the extent to" which you have found the course 
sub] ect matter and material^ applicable to your. work. 

I have found little or no relation between the 

course content and my nonffal work. 

. I have referred to the- materials and information 

presented during the course on several occasions since. 

■ = "^^^ course content has proven moderately useful 

to me in my work, i refer to th*t content almost ^ 
monthly. ♦ >' 

— : — I have found the course content has extensive 

application in my work area and I refer to that 
=^"ten^ frequently, perhaps on a ifeekly basis. 

llt^^l :J^<^9ing the value of the course in terms of the 

extent of its applicability and usefulness, have there been one 
or more occasions since you attended the course Where the 

have been critical or otherwise very 

eJo-erlifn^ y°^/\3°"^f Phase of work on a particular problem, 
experiment, project, plan, or other activity? 



^ES) How many such occasions? 

If YES, would .you comment briefly on the nature of these 
application^? 



V 



■ , (Thank you) 

EKIC -340- 



\ 



\ 



H APPENDIX A . ^ ^ 

Example A-4 

* Instrument ; Structured Personal Interview P rotocol 

^'l^LPOse- TO gather factual and impressionistic data 

concerning aspects of short 'course implementation 
in an industrial- or business environment. 

Implementation; The protocol provides a structural and substantive 

format to guide discussion during a personal inter- 
view between course designers or evaluators and 
knowledgeable representatives of corporate clients: 
e.g., a continuing education coordinator, a .plant 
training manager or supervisor, or a course' 
iristi;yctor. ' 

» 

utilities; A) To discover corporate perceptions of course 

utility and to identify further iducatlonal 
needs . , 
f 

B) To Clarify the charac^ristics of corporate - 
personnel who comprise ^he course audience. \ 

C) jro identify preferred instructional procedures 
and presentation modes followed by corporate 
clients to implement the course (and to 
contrast these with design specifications for 
the course) . 

D) To identify formal and informal modes of course 
evaluation and assessment of participant, 
learning presently- employed in corporate 
settings. ^ ^ ^ 

• . E) To gather information required for formative 
and suramative course evaluation. 

\ ' . 

F) To make ac^aintances and friends in the field 
of potential course users and to test the 
marketability of new courses or course 
modifications. 



-341-. 



ERIC 



3C5 



Example A-4 
Structured Personnel IntervieJ Protocol 



Questions ab^p ut matt ers of fact and procedur e :* 

I) Why did the c&mpany want or need a course like "Design of 
Exper iments'*? . ^ 

I'D How did. the company find out about and'^elect "Design of 
Experiments"? ' 

III) What company personnel took the course? ■ 

A) Why were these persons selected? \ 

B) How were these persons selected? 

/ 

• \ 

C) What Incentives, benefits, or compulsions werd used? 

'D) What consequences for subsequent employment and/or 

advancement opportunities were contingent upon successful 
course completion? 

E) What arrangements 'were made for: * . . . 

1) course time (i.e^, regular duty,, release time, off- 
duty etc. ) 

2) travel, meals, expenses, etc. ^ 

3) ''books, materials, supplies, etc. ' 

IV) How was the course implemented by the company? 

A) Scheduling meetings, films, discussions, exeuninations, etc.? 

B) Homework and other course-related activities outside formal 
/ class meetings? ' - 

C) ^Was an experienced statistical consultant on hand to help 

students?. 

1) Was this person a company employee or an out#ider? • 

2) Was there a consultant plan to utilize this person? 
,a) Did he grade or otherwise commer^t on homework? 

\ b^ Did. he give demonstrations; act as an instructor? 

•4) liid he. provide students with company/job^related 
examples? \ 

.ERIC • „ . "1^^- . o'CG 



Example A-4 continued 
V) now did the company evaluate the course and the student^? 

A) The course cost^ money; what* benefits accrued the company? 
How were these measured or appraised? 

B) Did the students complete "participants questionnaires" or 
other similar instruments or surveys? 

1) Were student "comments" solicited by course faculty and/ 
or training school staff? 

2) Were these and/or other data sources used tO: 

a) Justify course expense? 

« » 

C\ * 

^ h) modify coufse design and implem'entation strategy? 

I C) Were any formal achievement tests given students? 

1) Was a certain score on such a test used to indicate 
successful or unsuccessful completion of the course? 

2) Did such "grades" go into students' personnel files 
as permanent records? 

« 

D) Were students' supervisors provided- formal or informal 
^ reports? 

Questions abcMitjiiiat^^ of opinion; 



I) 4)1(3 students like the course? Did they think it was a worth^ 
while expenditure of time, effott, and money? 

II) Is the company satisfied with the course as a whole? 

» ^ A) What fieatures were especially good from the company's 
viey^point? ' W 



B) What needed to be or was done differently? 

Ill) Could or should students be selected differently? 

IVKWas a statistical consultant important? To what degree was 
the course "self-instructional"? 

V) Would theMbmpany be interested in a formal evaluation process 
aflmed at j|^^uring and- reporting individual student achievement? 

A) "Design of Experiments" costs about $120"! 00/student . .Would 

-. thte? company be interested enough in formal evaluation to 
• pay an additional $10-15 per student for this service? 



APPENDIX R . ^ f 

SAMPLK ABBREVIATED/EMBEDDED TEST rOR COMPREHENSIVE' ^ ' 

\ ASSESSMENT OF COMPLEX KNGl^EOOE AND SKILLS 

' \ " • t 

Th-e short test which 'follows is an actual test for • 
-one unit in a six unit coui^se titled, Hydroro_g^_and - 
Sedimentoloqy of Surface Min.H t.^, by C. T. Haan and 
B. J. Barfield, University of . .Kentucky : Office of Continuing 
, Education a^d Extension, College of Engineering, 1978 . 

The course is a -very- popular short course .taught^ in 
three day intense workshops. The enrollees are mining 
engineers and others with interests .in the construction Of 
better water- drainage and storage strucrures for surface 
mining operations. The course is highly technical and 
develops an ability for participants to use a complex set 
, procedures presented in the course ma^al in the design - 
of actual structures ander very different -types of slope/ . ; 
soil , climatic, an4 ipining - conditions i 

The_solution of entire real probleins takes several ' 
hours and sometimes evan.a day or two. Therefore, the 
actual teaching of- the course, as well as the testing of 
^ competence of participants at the end of a unit of instruction 
or the end of the course, cannot be based 'upon having 
participants complete actual entire problems.' There s,in\ply 
would not be enough time." " < - . ^ 



-344- 



ERIC 



u v> O 



The sample short. test presented is one way to assess 
the knowledge and skill of course enrollees in the complex 
content of the course. The test items range from simple 
and basic understanding of principles through the application 
' of these to the solution of realistic, complex problems.* • 
Persona' test scores reveal much about what the individual 
has learned and What he or she may^not have 'learned . 

Similar short tests may be constructed for other units 
in this course or other courses. These unit tests can be ' 
assembled into one comprehensive test. For this course it 
would take about one hour td complete such a 50 item test. 
The test would abbreviate a set of '-realistic problems which, 
if presented'in full, would'take many hours to complete. The' 
test is, thus, an efficient estimate of the learning of ' 
• persons based upon 'a much shorter time period of activity, 
provided the itejns ai^ sampled appropriately and properly 
constructed.*' - - 

As sUch a test is developed, parallel forms oan be 
produced. This allows the use of short but comprehensive 
tests for pre-tests, embedded tests, post tests., and delayed 
post tests. All of thesi types of tests can be useful in 
assessing, not only the learning ^outcomes for a course for 
individuals, but for judging the effectiveness of the 
course as well'. " ' 

-Alchougn aoDreviated tasks" of the type iijc^oded^ iiT the 
sample test are never a substitute for the assejssment 



ERIC 



-345- 



c 



of learning by observing actual on-the-job' 'performance 

after the completion of a course, or by ana'lysis of actual- 

y 

wo-rk sampled- of persons complected after the course, the' j 
abbreviated teat ta?ks can be an efficient way to j^dg^*" th^^' ' 

■ degree of learning- resulting from a course- at. it-s' conclusion. 
The sample items which follow, the explanation, ' 
commentary, and the guidelines which are incLud6<3 may be - > 

' helpful to understanding how: such efficient but brief tests 
of complex performances mSy be developed^^d assembled. 
Studying Chapters 7 and 10 will also add -to this- understanding.. 

The charts, tables, and nomographs which are attached 
to the samfJle test are taken from the Haan and Barfield (lC78) 
manual. They contain information' needed to solve the 

. problems presented in the items. Coupled with bbe test J. terns' 
they test for. the ability of. individuals to make proper use " 
of the manual arid its materials in 'the solution of prAklems of 
a realistic naturq. These realistic pr6bl6ms aife presented 
in the .test_ items. They are sampled from" the domain of 
real problems fre^ently encountered, in the desiqit of open 
channel hydrologic drainage structures in surface mining 
situations. For convenience the complete sample test is 
presented in/their appendix, although it occurs earlier in 
Chapter 10* as Table 5. . , . % * . . 



fertormance Objectives for Which Items Wefe W ritten- 

< " " ■ ^ — 

As pointed out in Chapter 10, the specif ic^performance 
.'' . ' > • 

objectives stated in operational terms need tO be developed 



ERIC 



-346- 



prior to the preparation of the instructional activities 
ot test items by which to assessthe achievement of these 
expected outcomes • The performajice objectives for "the open 
chajin^l hydraulic structures unit of the Hydrology and 
Sedimentology course are stated in Table 4 in. Chapter 10, 

■ / - 

For convenience this table of objectives is also presented 
in this Appendix. It is th6se particular performance 
objectives that the sample test items are designed to assess 
The reader should now examin^' the performance objectives 
in Table 4, the test items developed to assess these 

/objectives in Table 5, and then read the additional comments 
which follow. These explain the d^arCai^ of how each item 
operatBS, what it is intended to measure, and why. The 

.^example should be useful to persons wishing to consti^^ct 
similar tests for units in technical .courses. 

Ptesentlnq the Stimulus Elements ^R^quired for Performance 

Appended to the set of test Items sHidents receive is a 
aet of figures, charts and tables ^ (Figure* 8 ) . One main 
objective of the co.urse is to teach students the proper use 
of .these materials containe^i in .the manual. All of the 
figures and . tables^appended bo the te.st booklet have to be 
used to solve the problems or' answer 1:he 'questions , except ' 
for Figure'^ 3. lOy Since aU the figures 'ocbur intone place . 
with the tables- after the test items^ students have to 
"aiscriinina^e rom among the entire array the particular 
table or figure needed for a Dar|:icul^r aspect 'of a problem. 



Table 4 



Performance Objectives for Open Channel 
, Hydraulic Structures Unit: An 
Illustration of Test Construction Procedures* 



4 



Obiective nescription of the Performajice 

Vumbe^ - 5 i?^ ' Required^ and the conditions 

^iHP-ber Verh(s), Under Which it is to Occur " 

' i Describe What happens to the value of 

Manning's n when the boundary of a 
' •, -channel varies through a 'range of 

r ■ structural conditions including' ' 

if Cerent types of vegetation, non- 
vegetated soil aggregates, and 
man-made lining materials. 



/ 

5 Calculate 



Recall, . The typical profile of flow' ' 
Recognize - velocities (fps) for hydrologic 

channels of* various cross section 
shapes at typical slopes. 

Describe The relationship between retardance 

Adjust and.^f.low rate in an hydrologic 

.Calculate channel and make adjustments in 

\ design specifications (depth, top 
width, hydraulic- radius, slope, ' 
and cross section*) to produce 
desired freeboard and channel - 
performance given changes in 
retardance or flow. rates. 

Cal9ulate- By the limiti^ig velpcity method' the 
permissible flow rate for channels 
«7iven various s lope s,.*,^^qui red 
capacities, boundary conditions ,- 
soil type^, and 'channel cross 
sections. 



By appropriate methods and proper 
use of tables and charts provided, 
the value of Manning's n for. any 
type of channel, given the*l3oundary 
characteristics. 



*See Appendix B for details about hOw th^ performance 
descriptions wena developed and how te^t items were designed 
to measure each objective. 



-348- 



Table 4 \ (continued) 



Action 
Verb(sL 

Calculate 



Calculate 



Dpsiqn-, 

Diagram, 

Label 



•Recognize 



Use 

Select 
Doublecheck 



Description of the Performance 
Required and the Cojjditions 
Under Which it is ,to Occur 

The hydraulic radius of channels of 
differing cross sections according 
* to- the appropriate .i;nodification of 
the basic complitational algorithms. 

The design specifications for any 
, given channel including the values 
Vp, R,' s; D, T^ and necessary free- 
board given the specifications for 
any two of these values and 
information about soil type, 
topography., , Qtc.^ • - 

A hydroloqic channel designed to 
.perform to stated sf)ecif ications 
undfer stated problem conditions- 
similar to those listed in item 
g alpove. 

The reasonableness of design 
specifications obtained, as the 
solution to a particular design 
problem involving a hydro-logic 
channel given the problem variables. 

Appropriately, computational short 
cut procedures, compu$:ational 
algorithms, and graphic solutions to 
complex, equations gi^ven a variety 
of problems involving. the design of 
hydrologic char||els uftder widely 
differing conditions o^ rainfall, 
':poil/type, siope, etc. " 



\ 



Table 5 

TEST FOR "OPEN CHANNEL HYDRAULICS'; UNIT - Illustrating 
the Mapping of Items to Performartce Objectives 



!• , What is a typical profile of 'flow velocities (fps) for 
the channel cross section represented in this figure? 




What happens to the value of Manning's. n when an- 
erodible' parabolic cross section- open .channel is i 
vegetated compared to 'an identical nonvegetated channel? 

A. increases ^ ' * ' . 

B. decreases 

C. remains unchanged 

D. varies with runoff vbiume 



A nonvegetated trapezoidal channel through a sandy loam 
eollidal soil has originally been designed to carry 8 ^ 
cfs of water down a 4% slo^e. Suppose the engineer later 
decides to use a vegetated channel. What must he do to 
insure *an equivalent capacity with the vegetated channel 
given the same slope, soil conditions^, and Channel shape? 

* * * ' 

• A. Select a qras^which wi/l qrow'^to a uniform' 

height without clumping to assume uniform 'flow 
rates at the channel ^perimeter . 
Design a* somewhat deeper and wider channel to 
allow for the increased ..retardance- or^the^ flow 
caused by th,e veQetationL * - * t . 

Design a somewtiat, .shallower and narrower channel 
because wi^h vegetaiiion .a higher flow rate can 
be sustained. * / . ^ , 

Mai'n'tain" the original 'specifications for the non- 
vegetated* channel because the flow capacity will' 
femain xiearly unchanged. 



• 



D. 



Table 5 ^k:ontinued) 

A channel is to be deslqned to carry- 11.6 cfs of* 
clear water down a 7^ slope. The channel material is 
shale and hardpan. The channel is to be trapezoidal 
with a 3;1 side slope. Use this information to answer 
questions 4-8. 

4. Usinq the limiting velocity method, what is. the 
permissible velocity (fp^) for watj^r flowing in ttiis 
nonvegetated channel? 

A. ^6.0 . ' * 

B. 3.5 " . ' 

•c: 2.7 • ' ' ' ' ^ 

D. 4 .0 . ^ ' ^ 

5. What is the ycilue of Ma^nning's n for this nonvegetated 
channel? 

,A. .037 

B. .020 • / . . 

C. .030 . . P 

D. ..025 ^ 

6. U'sing Mannings equation, Vp =' ' ^^^^B , the 

hydraulic radius of the channel is calculated .to be 
1.32 ft. The channel cross ^ection area is' found from 
A = Q/v and is calculated to be 1.93 ft^. The engineer 
then assumes that the channel depth should be 
approximately 1.3, feet. He also assumes that the bottom 
'width, d, can b^ estimated from A = bd where 
b + 1.93/1.3 or 1.48 ft. Vftiat should he* next?^ 



\ 



A. Add 20% to the depth value and the bottom 
^ width value to provide adequate freeboard in 

case of a heavy rainstorm. 
B^ Check 'to see if his approximations for depth 

and bottom width "are reasorrable by using the 

relationship 

■ - bd + zd^^ 

^ ^ "b ?A)fz2 + 1 

C. Calculate the top w^th of the channel by 
using the relationship, t = h + 2dz. 

D. Calculate the wetted perimeter val ue for t he 
channel using the relationship 2d' Y + 1 to 
determine flow resistance . 



\ 



* Items enclosed, in brackets contain information in their stems 
- necessary for the solution o'f problems contained in" later items 
.in that group of items. . \ . • 



-351- 



\ \. • 



• -J 



Table 5 ('-ontinued) 

What can be said about the engineer's estimates of the 
values for the depth and bottom width of the channel? 



A 



B 



D 



Doth values are a reasonable approximation of 
the true values. ' . 

Neither value is a reasonable approximation of 
the true value. 

The width estimation based on assuming a 
rectangular^cross section is only slightly in error 
The depth approximatiojn is ba-sed upon assuming 
that R - d and is quite accur&te for' this channel. 



What are the final values which are necessary, for the 
depth (D) bottom width (b) , and top width (T) of the 
Channel if i\t is to operate at the capacity giv^n in 
trie first part of this problem and under the soil/and 
slope conditions specified? Include the necessary 
freeboard (f t. ) . • 



A. . D = 1.3, b 

B. D = 1.6, b 
0., D = 2.0, b 
D. D = ^.4>, b 



1.5, T 

1 .8, T 

7.0, T 

7.0, T 



9.26 
11.1 
15.0 
18.0 



\ 



A par.abolic channel is to be d^ign^d to carry 25 cfs 
of water on a 4% slope. Because the- soil is, easily 
eroded the designer- decides to vegetate -tHe channel 
with fescue which is to be unmowed. Use this' 
information to answe^ questions 9'- 11. 

What^s the maximum permiss^bfe velocity for water 
flowiha through this channel (fps)? 



ERIC 





.A. 


3 - • • 






B. 


*5 






C. 
D. 


7 

3.5 " ■ . 




10. 


What 


is the. retardance class 


^for this vegetated 




A. 


A 






B. 


■B 






C. 


C 






D. 






li. 


What 


i 

is the hydraulic radius 
• 


of this channei? ^ 




A. 


1.1 






Br* 


.58 






c. 


.82- • 






D. 


) • 








-352- 













\ 



Noit Prttooord ^O'O for qh S«i 

9 T 





Cro$»:S«ci>onol 
Ar«o 0 


^•n«d 


Mydroulic- 
Rodiui R'jy 


Top Width * 


\ 




bd+Zd* 


f = b -H.adZ 
r= b +^02 


d^adVzV 1 


' D -^2dVzV"l 



«Trop«20ido( C/on Section 



Zd' . 




Zd 


t » 2dZ 


2d VZ' + f 
• 


2VZ +1 
d 

J opprox. 


Cfoss section 


1- 




t«d 
I5t*,+ 4d* 

^ oppfox. 


a67d 



Porobolic Cross Section 

Figure 8* Properties of typicaL channels. 



Table 10 Limiting Velocities and Tractive Forces for Open Channels, 
(Straight after Aging) 



Water Transport- 
For Clear Water ing Couoidal Silts 







0 


Tractive 




Tractive 






Velocuy, Force, 


Velocity, 


Force, 


' Maligna] 


n • 


fps 


psf 


fRS 


psf 


fine sand colloidal 


0.020. 


1.50 


■ 0.027 


2.5'0 


0.075 


Sandy loam noncolloidal 


0.020. 


1.75- 


0.037 


2.50 


* 0.075 


Silt loam noncolloidai' 


0.020 


, 2.00" 


0.048 


3.00 


0.110 


Alluvial silts noncolloidal 


0.020 


2.00 


0.048 


3.50 


0.150 


Ordinary firm loam 


0.020 


2.50 


\ 0.075 


3.50 


0.150 


Volcanic ash 


0.020 


2.50 ■ 


0.075 


3,50 


'(j:i5o 


Stiff clay very colloidal 


0.025 


;3.75 


•0.260 


5.00 


0.460 


Alluvial silts colloidal 


0.025 


3.75 


0.260 


'5.00 


0.460 


Shales and hardpans 


0.025 


6.00 


0.670 


6.00 


0.670 


Fine grav^el 


0.020 


. 2:50 


0.075. . 


. 5.00 


0.320 


Graded loam to cobbles when non- 












colloidal 


0.030 


3.-75- 


0.380 


5.00 


0.660' 


Graded silts to cobbles when colloidal 


0.030 


4.00 


0.430 


5.50 


0.800 


Coarse gravel noncolloidal 


,0.025 


4.00 


0.300 


6.00 


0.670 


Cobbles and shindes 


0.035 


5 00 


• 0.9113 


5 50 


1.100 


From Lane (1955), 


** 








»- • • . » 


\ 











cn 
I 




HfO«AULlC RAOlUi , k f1 




RetardknccQass A 



Retardance Class li 



Retardance Class C 



3S0 



Figure 10. Solution for Manning's equation, vegetated . 

Retardance cla-sses A,, B, and c. (SCS, 19^7) 



waterways . 



-4 



■ y 



ERIC 




Retardance Qass 0- 



Retard ance Qass h 



-3S2 



Since this is an important part of .what is taught in the 
course, and also of what is reqiiired in- the real work settinq 
It is appropriate to^ require .such tasks ort the performance - 
)|:est . ' ' • . 

^ V V • * 

V 

' Produc ing Items W h ich Test for Varioug Levels of Skill 
and Knowled ge • 

^ For the- most part the test 'items do not require 
computation. Rather they require- knovledqe of relationships 
and procedures. Itelns one through three test for knowledge 
of basic properties and relationships, it would be possible 
to develop a number of items, to "test for relationships other ' 
than those presented. Those items developed and included on 
•the test .ought to he central and important to wise use of 
the procedures being taught. 

. Items 4 through. 8 represent a problem parallel to a ' ^ 
practice problem given in the course* for this unit. The ^ • 
example problems in each of th6 chapters in the manual define 
the functional competencies expected' of students. These are 
the performance objectives for each'unit or chapter. - 

Therefore, it is best to prepare items which test' for hnowledg 

.of -the procedures and methods which are required for 'solution 
Vf -the problems. Although- the problem presented i*n items 4 

through 8 is parallel to the. practice .problems; used in the - 

course, it presents different soil, slope, and ot^ier. 

cfharacteristics'than were encountered in the praqtice problem. 

The new problem is parallel with xespect to the skill and' 



knowledqe required fo^ its solution, but not dimply another 
identical problem where the individual need only substitute 
in new values to* obtain the correct results. Each test item 
in the series of fivei items attefapts to measure some particu 
aspect of the person's knowledge and skill in using the 
procedutes to solve the problem. In addition, each item is 
written to be independent.^ of they^ther items in ^he .series 
with respect; to having^ to have the correct, answer' to one 
item in order to ohave the correct answer to a later item. 
It is permissible to haVe a related series of "i4:ems about a 
common set of problem 'situa.tions , as long 'as -the answer 
to^any one item does n9t depend upon the answer to. any\ * 
other^ item. - ^\ ' ^ ' ^ 

Items 4 and 5 test for knowledge of how to' entfer the 
correct 'table^s^-^d extract the correct value for two variabl 
given carta injapiblem -conditions . Item 6 measures the 
concep^ts relatM to proper estimation procedures in this 
type of problem. Item sev.en is ^ similar item.' It tesfcs^ 
for knowledge of when it is appropriate 'to apply a ifule of ^ 
thumb; the rule being that for shallow, widr channels/ d i^s • 
approximately equal to R. ^ , . / . • * 

Item 8 is the only item" so far which requires' any 
computation. .It resquirea ^the ihdivi^al'-to use informa?tion 
given in. the origin&l ^s\:atement precedi-ng it"em 4 "and'*"-^^^^ 
additional information given in item 6. From this y ' ' ^ 
infbrTnation the specif ications^'f or .the^channel eafi be" 



calculated and the- freeboard value^ determined. This is 
t'he most. difficult and tim^ consuming" item, a few 
computati.onal items of this type are needed in' order to 
insure a wide range of item difficulties and to assess' persons 
knowledge and skiU across the range" of performance required 
- for solving these types of complex pribl ems. In other units, 
of the course, such as the one dealing with the- universal 
soil loss equation witri all of its very complex four parts, 
it would be best not to require the working of a problem' 
•involving all. of the parts of the equation. ' Rather, three 
Of the values in the equation might be presented as already 
having been determined with the .fourth to bd determined ,f rom 
the appropriate- use of information provided- in a 'problem or" 
question and through the selection\nd use of 'appropriate ' 
nomographs, rul^. of 4thumb, tables, and approx-imatibn ^ • 
procedur^. Once ^gain, ,items wBtich test for, knowledge of " - 
how to te6tj.the validity Of the approximate'°solutiohs 
achieved by these procedures shoaid be indluded,, since this 
is- an important intended... 6utcome fo^ the course. 

Items 9, 10, and 11 are intended as another proble^ 
series^whiclj tests for k;ipwledge of procedure, rules of. 
thumb, and checking on estimation procedures 6or a' channel of 
a different shape to be designed with vegetation... The . * 
complete question series through the' checking on the . 
estimation procedure, values and the cal3]ifjLonle-f---tlte-^tm 
design specifications are^jjp^^^^ed. in the sample test. ' 

.'-359- 

«J v_/ 



,'However they could be developed the same way as is illustrated 
in the item 4 through 8 series. Again, because of time" 
constraints only one or two of these computational' items, 
for each unit or chapter should be used with mpre^of ,the 
other types of items. which test for basic knowledge of ♦ 
cpncepts, relationships, and procedures. 

When a seriiss of related items and stimulus information 
at t^e beginning of these items is ta be used by students, 
it is important to tell the persons being tested that » . 
series of items are presented in places; that the information 
given in the stem of the question and the bther introductory 
information is needed in other -questions but fehat^a wrong ^ 
answer tetany one question does not necessarily ftieah that all" 
the^remaining questions* in the "series\ill be incorrectly 
"answered. It is also important to enclose any question 
series in a , well' defined bracket marked on the margins of the 
test item booklet, as indicated, on the sample items'* in order 
to indicate which items; share common information in their 
stems, /' - • • ' . 

Some General Guidelines 

The general guidelines ' which follow may be helpful in 
aesign4ng test' items for technical courses similar to the 
"Hydrology & Sedimentology'' course. 

1- Test for ^What has Actually Been Instructed — Present only 
problems or questions which were actually insti!-ucted in the 



-360- 




short course itself. The Hydrology and • Sedimentoloqy text ^ 
has much additional information and detail' that must 
' necessarily be omit'ted in the short course presentation. -^'^ 
Participants cannot have been expected to have studied ' the 
text thoroughly Sy- the end of the short course, but they can 
be expected to have understanding of basic ^procedures such 
as how to'set up 'a problem; which^models , assumptions, rules 
of thumb, and basic parame.t^rs to use; "and' how to- extract 
desired values^ and .graphic solutions tc certain equations*, 
. .from chafcts yv* tiatoles ,. anct' nomo<T;rapH^i "''ivhis »is what the -test 
/■items, gh4)uld>;^3t for . ' ' ' 



Be , Sure Pecfarhiance Objeqt^ves, Tesit. Items and 
^ l^emoristration Pro^^lems -are Congruent^ I/se the structure of-^^ 
the actual .problems .used as instjrvictional example problems 
a« the operational description b'f ■ what it is persons ^ sh6ti Id 
be able to do at the end of the course, ' Those problems 
shoul^be^ very clear, designed explicitly to illustrate ' the 
eoneepts and procedures to be^earned , ' and can be broken 
down into individual test 4tems to assess competence in .each 
phase of the .procedures' in. each* section of the course. 

Pevelop Jtest Items Which^Map the- Full Range of Perfo'rmancfe 
— Desigii several' types of ,questions for each unit or chapter. 
These should be graded in difficulty front easy to difficul't 
>and should include: - ' 



V 

ERiCv 



-361- 

' 387 



A. ; Basic information and concept questions cdneerned 
with definitions, terms, and simple coneepts, upon which ,the~ 
otKer procedures depend'^ item ^2 on the samplfc test is such 

an item. , ' ; ' ■ ' • ' 

■' , ** 
-B. Basic relationship questions which test, for 

compreHension'of the relationships between physical variables 

and their representation i'n equations, gaaphs, et'c- For' . 

example,, a question about the ^olnt pn the" inflow and ou£- 

flow hydfoqraph where the time of maxiifium. storage occurs. ' 

can test coVprehensiori of -such a relatioashlp/ .Another" 

questdon might be written tjD ask.whV the -'time ot ' maximum 

.storage is where^ hhe outflow, and inflow' hydrographs .crosg. 

Four answers cq^ld be provided With one correct and the. other 

Items 3 and 6 on the sample 

test assess this- type of performance capability. 

• * • '\ 

C. Procedural quest-ibns which test. whether or not the 
person recognizes the" proper steps " in- setting pp a problem,' • 
working thfoagh the sdlution to a problem, etc. Notice the 
emphasis upon recognition . In such -items the basics of the 
problem should 'b§ presented and several alternate ways of ^ 
setting up the problem would then.be given.' Only pne woui'd 
be correct. The others would all be, in error'in some way, 
because of the misapplication or failure to adjust a-model 
for a* particular set. of conditions, etc. This is What item* 
6 on the sample test ig. designed t6' do. <rhe other items 
whichr require the person to recognize the correct values^ for 



^^anning's.n or for maximum perniissible flow given soil type 
: and other information also do, this. The person must ' 
recognize the correc.t- value or be able to match those - 
presented against, those he or- she looks /up in a table or chart 

D. ^Concept application, problems" or questions which 
test 'the degree to which the person understands where a 
• particAilar concept, ;^rule of- thumb, procedure, or method 
appli-es and does ndt apply. This type of item can often be 
written by ^giving .the {Physical description of a problem and 
,then -having the student' recognize the correctness of the 
approach or approaches outlined By which to solve the probiem.' 
An example of this type of item is number V'onthe sample ' 
test, » . , • 

,4. Prepare Brief and Ti me Effici ent Test Items — As is 
Clear from <the above discussion, most items '^^ould test for 
recognition of correctness of procedube, application of . 
concepts, setting Up of problems, and reasonable variable-, 
'values as outcoipes. Relatively little emphasis sliouid be 
given to computational items because there is ■ too' 'little 
,time to 'do so. in addition, any such test will be only a* 
test of basic knowledge and skill in using the ideas and " 
procedures. in- the manual, not in facility' in actually ' 
applying-, the^id^ag and concepts in a highly accurate and wise 
manner*. The latter ^outcome is desirable and can be achieved 
hut not within the time. limits "of the short course. The / 



test items should be a 'reasonable sample, of performance that 1 
can.be expected tp result from the actual short course 
instruction. A delayed post test or work samples of practicing* 
engineers can be used to assess Idng terfn continue* growth of ' 

knowledge and skills weeks, or mbnths' after, short course' 

, " ■, " - 

pompletion if so desired. ' . . ' « 
5 . Provi de Materi als an d Inlormatlon Needed to Solve the 
PXob-'^ems — Informatibn about formulas, equations, charts, ' * 
table's of values, and nomographs' should be provided. Test 
itfems should ^test for knowledge of how to use and apply such 
relationshiias , not "for Recall of formulas and relationships. . ' 
These facts- pan b& looked up by any practicing engineer and - 
are routinely. The cKairts, formulas graphs , tables, and 
nomographs jieeded to answ'er questions otight to X>e clustered 
together in sections for portions' of the test to be available 
-^o--pers6ns, but also to provide a- test of their ability to 
discriminate 'from among and properly use the appropriate 
equation, chart, or table/ Example test items 4-6 attempt 
to illiAStrate this practice. 

^ ' "se Std n dard Procedures .to. Pr oduce npo ^ Mi^^tiple Choice 
Items — Follow the usual procedures for ^the design of good 
multiple cjioice items. A set of '/these general procedures ^ 
is provided in a. listing in'Tab.lel^lo'. It should be - ' 
apparent that multiple choice questions are very time * ^ 
efficient both 'from- the' startGpoint of the time required for 
completion of^the test and for scoring. However, any. of the^ 
' ' " ^ -364- 



- itemp pre-sented In -the sample test could be uged as "an essay', 
Constructed response /6r problem solving item;. ^The stems 'of 
good multiple choio^ items always have this property. ' If thfe 
^teih is well designed it can be' used I'n either 'the objective' 
.multifile chpice fpfrm^t or as a constructed response, item. 
Persons interested in the details of constructing multiple 
choice items^ may^f ef er to Maratuza (1977°). or other similar 



sources referenced in C^iapter,s'-.;LO and 11. 



Tab;e lO 

htsr OP co-!Gioi;wTo-]'i, '"O'^ PRi;PA:aM i 

. MULTT.PL'' CHOICn ITt^MS ' 



1. Is,! the item stain clealrly v/ritten for. the'Jntonded 
gr6iap*oi: exaninoes?* • \ . ^ ^ 

2l Isjthe iten stem free of irrelrvanb material? (Sonc- 
tii^es in a complex p^o^aem auos i.ion ^'ou |mav x;ant so^v-- 
irrelevant, nivens to test the'person's knowledqe of « 
which relationshios to use.) See sample (•fern 1. 
"Parabolic cros^ section" could' be deleted, but its ^ 
presence requires' some discrimination of ^irrelevant 
information from r.elevent information. 

3. ' Is a problem clearly defined in the item st^m? 

Arc the choici^s clearly written for the intended group 
of examinees? - • * 

5. Are the.'choic^s' free of irrelevant material? (Again/' 
the 3 false distiractors need* to be false and so 'may 
' make use of gi^n'^raaterial. which is normal to the^ 
problem But" ir/^levant to the given aspect being ' 
^ tested in a pafrticulat. Item.) - , \ . 

6,. Is the^e a cprrect answer or clearl y best answer? 

7/ Have 'V/c||d^ like, "always" , "none", or- "all'-' been 
removedllfrom' options? a ' 

3* Are lik^y examinee - mistake^ used to' prepare incorrect 
answers wr options? 

9. Is "all offi the above" avoided , as a distractor? 



10. Are the chc^Lces arranged in a logical sequence (if 
* • one exists)^^ - , 



11. V/as the cor|:ect answer randomrly positioned amohg the 
available, ofttions? ' A * • ' 

12. Are all rep^titd .©lis words or" expis^ssions removed from 
the choices Ihnd included in the xtefn stem? (Example - 
item 4 on tl^^ sample should havejfps in the stem, not 
after each distractor .^) , 1 

13. Are all- of ti>e choices of approximately the same . 
length? (P^rrsons tend to select the longest option, 
and th-e longest option is also more oft^ the correct 
one.) \ 



14 



16, 



17 



Table -10 (continued) 

Oo tlie.Lten stewi an.l choicG3 f ol lov; Htandard .rules a*: 
jiunctuation and granuii'ar? . ~ 

^re all negatives unc^er 1 ine'l? (r:xainple - v/hich factor 
is ndt relato^l to' the" numerical value for Manning's n?) 

Are cj^i^ammacical cues l:>etween the item stem and the 
choices, v/hich nigh£ give the correct answer away, 
removed? 

Js the item format appropriate for ^measuring the - 
'intended objective? , ' ' ^* . 



IG* Are items independent from one another in terms of the 
ansv/er to iton n + 1 not being dependent on iten 
n, etc.? 



-367- 



003 



AHTHOR INI-I-IX 



.■-Airasian, P. W. , 201 

Aleamoni, L\ M. , 7 " , ' 

• • 

.Barfield, B. J., ?2, 30^ si, 55 
lfl7, 114, 13R, 141, -151, 
■ 16^5 ; 284 , 30*8- ' ■ 
. nellack, A. A., 5- . ~ " ' 

Berliner,. D^'c, 146 , • .» 
Bloom, B. J. , 58 ' . ; 

Bloom, B..S..,-6, 176; 177, Isi, 201 
Blythe, D. K. , -22-, 3.9, 141 " • . 
.Box, p. p,.,^ 63 , 

nrgmble, W. J. 05 . 

nrigqs,'L. J., 56,!5R-, 165; 173 ' 

^uqelskik, R.-r,, 10-4, 112' '■ 

Carroll, J. b. ; 53 ' . ' 

Cleav,er;,*T. , 1*36, 20^? 
Cohen,' P. A., 13 6,-137, ^51 
Cole, .n. P. , 41, 147 ' ' ' • ' 

Enell ,.- J.- w. , ?8, 264 ♦ 

Ericks'en,, S. 'C; , 136 ' ' • , 

Per.ry, R;, 2>2, 39', 141 

Gaqer N-. , -146 • ■ ^ 
' G^gne., R. M. , 7, 56, 58, I04' ' 

: 112, 131, 147, -165,- 178 ' 
■ Greehf ield, .L. B. , 6 
Crefenfest'., M. C.,»264 \ • ' 
Crobman, H. ■, 6, 7 " ^"^ \ ^ 

Grog^n-, ,W.- R. , 137 , 138-; 208 

Haan, C. T?, 51, 55, 107, 1;L4 , 

135, 151, 166, 284; 308 
Hales,' L. , 165 
Harableton, R. K.,-230, 25? 
Hastings, J. t., 6, 177, 181 

201 1 . 

Heimback, C. l. ,' 136 
Ho;Lland,.j. l., 4t 
Hoyt, D. P., 244 
Flunter, W. C.., ,68 
Hunter, J. S.., 68 , 

Jones, J.' A. , 2, 3-1 . -^V^ • * 

Kliebard, H.M.,5'*' 
Klus-, J. P. , 2, 31 
Knowlesr m'. S. , '36 
Kulik, C. C, 136, 137 , 208 , 251 
Kulik, J. A., 136, 137, 208,, 251 



Lacfefield, w. e..) 93,' -24-8,. ?'3<) 
T.av.iri; D. R. , ' 239,, 244 
Livinqston, A.','.-2 3t), 256; 
• 2 52 

'.A,OXih..F.. M,.,, Gl; 271, '3 07 •" ' 

.Mad.^us^ E. P., 6, .177, 181, 201 
Manning, W. II. , 58, 165, 179 
•Maratuza, V. E., 165, 201, 22^, 
333, 25^ • ' . 

■ Marion, R. , 178 " 
Marshall,. J. C, 165 

'.Martin, E.. .n. , 264 ' 
Maspn,. K. ,7. ,. 95 , * ^, 

McClelland, n.-'C., 2-39, 244.,"' 

296, 297 - ^ 
*1cCullouqh, R. C. , 34 
Mertens; D. M., 33„ 42 

4 6, '7 3 • . • 

Mill-er, D.'B., 42 
Millman, ' J. , ,201, .226," 228 ^ 

•Morris, A. J.; 4 6 ' 
Mor stain, .B. R. , 42 
MOSS, P. J-,, '22, 39, 141 

NovickV-M. R,, 61, 230, 252 
■ 271., 307 * ■ • 

Nunnallyv, j, c. ,• 165, 217, ' 

233 . , 

Paradise, N. E.,'112', 178 ^ 



Salvendly,. G. , 178, 179 
Sanders," J. R'. , 6, 21' 
Scriv^n, M. , 23,. 4 6 
?eymour, ,W. d. , 178,- 179 
Sherrill, P. , "46 - 
-Shoemaker, D. M., 61, 271, 307 
Smart, ^ C. , 42 * 
Snelbecker, G. E., 112 
Stice, J. E., 239, 244, 296 

Thorndike, R.rL. , 165 
Tyler, -R. w. , 5, 146, 165, 224, 
247,^52, 29'4, 297 

' Webb, W.- B. , 56, 165 
WlesehueJgel , R. e., 42,. 188 
Wolf, R. M., 22, 165, 294 
Work, C. , 237, 238, 248 
Wortheri, B. R., 6, 21 



